Machine Learning CAP6610

CAP 6610, Machine Learning, Spring 2023

Place:WEIM; 1064
Time:Monday, Wednesday, Friday 8 (3:00-3:50 p.m.)

Instructor:
Arunava Banerjee
Office: CSE E336.
E-mail: arunava@ufl.edu.
Office hours (On Zoom-- 924 861 2325): (E-mail for appointment first); Tuesday, Thursday, 11:00 a.m.-noon.

TA:
Anik Chattopadhyay
TA Office hours: (On Zoom-- 990 3599 1667): Wednesday, Thursday 1:00-2:00 p.m.

Pre-requisites:

The official pre-requisites for this course is COT5615 (Mathematics for Intelligent Systems). Specifically, knowledge of calculus and linear algebra is necessary since we shall be touching on mathematical probability theory. In addition, proficiency in some programming language is a must.

Textbook: Machine Learning: A Probabilistic Perspective, Murphy, ISBN-10: 0262018020.

Reference: Pattern Recognition and Machine Learning, Bishop, ISBN 0-38-731073-8.

Reference: Pattern Classification, 2nd Edition, Duda, Hart and Stork, John Wiley, ISBN 0-471-05669-3.

Tentative list of Topics to be covered

Bayes decision theory
Bayesian learning
Maximum likelihood estimation and Expectation Maximization
Neural networks including deep learning
Kernel methods including Support Vector Machines
Mixture models
Hidden Markov models
Principal Components Analysis
Independent Components Analysis
Monte-Carlo, Markov Chain methods (Gibbs samplers and Metropolis-Hastings)
Performance evaluation: re-substitution, cross-validation, bagging, and boosting

The above list is tentative at this juncture and the set of topics we end up covering might change due to class interest and/or time constraints.
Please return to this page at least once a week to check updates in the table below
Evaluation:

One individual project spanning the semester, individual reports: 25%
Homework assignments (written and programming): 25%
Two midterm exam: 25% each (Time, tbd)
There will be no makeup exams (Exceptions shall be made for those that present appropriate letters from the Dean of Students Office).
The final grade will be on the curve.
Course Policies:

Late assignments: All homework assignments are due before class.
Plagiarism: You are expected to submit your own solutions to the assignments. While the final project and presentation will be done in groups, each member will be required to demonstrate his/her contribution to the work.
Attendance: Their is no official attendance requirement. If you find better use of the time spent sitting thru lectures, please feel free to devote such to any occupation of your liking. However, keep in mind that it is your responsibility to stay abreast of the material presented in class.
Cell Phones: Absolutely no phone calls during class. Please turn off the ringer on your cell phone before coming to class.

Academic Dishonesty: See http://www.dso.ufl.edu/judicial/honestybrochure.htm for Academic Honesty Guidelines. All academic dishonesty cases will be handled through the University of Florida Honor Court procedures as documented by the office of Student Services, P202 Peabody Hall. You may contact them at 392-1261 for a "Student Judicial Process: Guide for Students" pamphlet.
Students with Disabilities: Students requesting classroom accommodation must first register with the Dean of Students Office. The Dean of Students Office will provide documentation to the student who must then provide this documentation to the Instructor when requesting accommodation.
Announcements
Midterm dates have been set. Midterm I on Feb 27th (in class exam) and Midterm II on April 26th (in class exam).
HomeWorks
List of Topics covered (recorded classroom lectures)

Lectures Topic Additional Reading

Jan 08 - Jan 14

Putative framework via example: NEST thermostat
Supervised, Unsupervised, Reinforcement Learning.
Independent variable, covariates, feature vector vs Class label, dependent variable
Continuous versus nominal features
Classification versus Regression

Very recent New York Times Article here
Older New York Times Article here
Couple of the referenced papers
here, here, here, here, and here.
Original GAN paper.
Original VAE paper.

Jan 15 - Jan 21

High level concepts continued
Concept class/ Hypothesis space: What do we fit
Testing on unseen data
Loss function
Generalization, over-fitting to training data
Bias-Variance tradeoff; underfitting and overfitting
Core areas: Probability theory, Optimization
k-fold cross validation; leave one out/Jackknife
Curse of dimensionality
Multivariate regression and Normal Equations

Jan 22 - Jan 28

Multivariate regression and Normal Equations continued
Ridge regression
Tikhonov regularization
Problem formulation of Lasso, Basis pursuit, Basis pursuit denoising
Density estimation and Latent variables
High level description of Generative Adversarial Network

Jan 29 - Feb 04

High level description of Variational Auto encoder
Tractable and Intractable distributions
Intractable posterior example: p(x|z) versus p(z|x)
Evidence lower bound (ELBO)
Mathematical framework for machine learning
Risk functional
Loss function
Framework for Classification and Regression

Feb 05 - Feb 11

Mathematical framework for machine learning continued
Framework for Density Estimation
The biological neuron; computational neuroscience
The perceptron/ arificial neuron
Various squashing functions: sigmoid, Rectified linear unit (ReLU) Leaky ReLU
The perceptron learning rule and mistake bound

Perceptron Algorithm convergence bound,

Feb 12 - Feb 18

The perceptron learning rule and mistake bound continued
Gradient descent for single artificial neuron (sigmoid)
Gradient descent for multilayer artificial neuron network (sigmoid): Error Backpropagation
Vanishing gradient problem (sigmoid), Rectified linear unit (ReLU) and Leaky ReLU

Feb 19 - Feb 25

Bells and Whistles
Multiple paths to higher layers
Residual networks, Highway nets, etc.
Convolutional neural networks (stride etc)
MaxPooling and Softmax
Batch normalization
Dropout
Various optimization techniques: Vanilla stochastic gradient descent, Momentum, Root mean square propagagation (RMSprop), Adaptive momentum (Adam).

Feb 26 - Mar 04

MIDTERM I
Technical Details: Gradient descent, Momentum, Root mean square propagagation (RMSprop), Adaptive momentum (Adam).
Intro to constrained optimization
Convex functions, Convex sets
Thm: Minimizing Convex functions on Convex sets--Local minima=Global minima

Mar 05 - Mar 11

Constrained optimization; objective, equality and inequality constraints
Lagrange multiplier technique for equality constraints.
Convex optimization problems, the Lagrangian, the Lagrange dual
Karush Kuhn Tucker conditions
Complementary slackness
Strong duality; Slater's condition
Here is a link to the book Convex Optimization by Boyd and Vandenberghe.

Mar 12 - Mar 18

SPRING BREAK

Mar 19 - Mar 25

Primal form of maximal margin classifier
Support Vector Machines: Margin maximization, the constrained optimization problem;
Primal formulation of SVM
Slack variable version of SVM for linearly non-separable data, hinge loss.

Mar 26 - Apr 01

Dual form of maximal margin classifier
Kernel trick
Polynomial kernel, Gaussian radial basis function (RBF) kernel
Mercer's theorem
Unsupervised learning; Roadmap for rest of semester
Maximum likelihood and Bayesian parameter estimation
Maximum likelihood principle (ML), Maximum a posteriori (MAP)
Density estimation: Maximum likelihood estimate of multivariate Normal Distribution

Apr 02 - Apr 08

Density estimation continued.
K-Means Clustering; Loss function
Soft vs Hard assignment
Plate notation
Expectation Maximization

Here are D'Souza's notes.

Apr 09 - Apr 15

Expectation Maximization continued.
Relationship to VAE, tractable and non tractable functions.
Decision trees, Random forest
Entropy impurity, Gini impurity
Introduction to information theory
Entropy, Conditional entropy, Mutual information

Apr 16 - Apr 22

Kullback Leibler divergence
Introduction to learning theory
VC dimension, Radamacher complexity
Markov and Chebyshev inequality

Apr 23 - Apr 29

Chernoff and Hoeffding bounds
Wrap up of learning theory
MIDTERM II

Lectures	Topic	Additional Reading
Jan 08 - Jan 14	Putative framework via example: NEST thermostat Supervised, Unsupervised, Reinforcement Learning. Independent variable, covariates, feature vector vs Class label, dependent variable Continuous versus nominal features Classification versus Regression	Very recent New York Times Article here Older New York Times Article here Couple of the referenced papers here, here, here, here, and here. Original GAN paper. Original VAE paper.
Jan 15 - Jan 21	High level concepts continued Concept class/ Hypothesis space: What do we fit Testing on unseen data Loss function Generalization, over-fitting to training data Bias-Variance tradeoff; underfitting and overfitting Core areas: Probability theory, Optimization k-fold cross validation; leave one out/Jackknife Curse of dimensionality Multivariate regression and Normal Equations
Jan 22 - Jan 28	Multivariate regression and Normal Equations continued Ridge regression Tikhonov regularization Problem formulation of Lasso, Basis pursuit, Basis pursuit denoising Density estimation and Latent variables High level description of Generative Adversarial Network
Jan 29 - Feb 04	High level description of Variational Auto encoder Tractable and Intractable distributions Intractable posterior example: p(x\|z) versus p(z\|x) Evidence lower bound (ELBO) Mathematical framework for machine learning Risk functional Loss function Framework for Classification and Regression
Feb 05 - Feb 11	Mathematical framework for machine learning continued Framework for Density Estimation The biological neuron; computational neuroscience The perceptron/ arificial neuron Various squashing functions: sigmoid, Rectified linear unit (ReLU) Leaky ReLU The perceptron learning rule and mistake bound	Perceptron Algorithm convergence bound,
Feb 12 - Feb 18	The perceptron learning rule and mistake bound continued Gradient descent for single artificial neuron (sigmoid) Gradient descent for multilayer artificial neuron network (sigmoid): Error Backpropagation Vanishing gradient problem (sigmoid), Rectified linear unit (ReLU) and Leaky ReLU
Feb 19 - Feb 25	Bells and Whistles Multiple paths to higher layers Residual networks, Highway nets, etc. Convolutional neural networks (stride etc) MaxPooling and Softmax Batch normalization Dropout Various optimization techniques: Vanilla stochastic gradient descent, Momentum, Root mean square propagagation (RMSprop), Adaptive momentum (Adam).
Feb 26 - Mar 04	MIDTERM I Technical Details: Gradient descent, Momentum, Root mean square propagagation (RMSprop), Adaptive momentum (Adam). Intro to constrained optimization Convex functions, Convex sets Thm: Minimizing Convex functions on Convex sets--Local minima=Global minima
Mar 05 - Mar 11	Constrained optimization; objective, equality and inequality constraints Lagrange multiplier technique for equality constraints. Convex optimization problems, the Lagrangian, the Lagrange dual Karush Kuhn Tucker conditions Complementary slackness Strong duality; Slater's condition	Here is a link to the book Convex Optimization by Boyd and Vandenberghe.
Mar 12 - Mar 18	SPRING BREAK
Mar 19 - Mar 25	Primal form of maximal margin classifier Support Vector Machines: Margin maximization, the constrained optimization problem; Primal formulation of SVM Slack variable version of SVM for linearly non-separable data, hinge loss.
Mar 26 - Apr 01	Dual form of maximal margin classifier Kernel trick Polynomial kernel, Gaussian radial basis function (RBF) kernel Mercer's theorem Unsupervised learning; Roadmap for rest of semester Maximum likelihood and Bayesian parameter estimation Maximum likelihood principle (ML), Maximum a posteriori (MAP) Density estimation: Maximum likelihood estimate of multivariate Normal Distribution
Apr 02 - Apr 08	Density estimation continued. K-Means Clustering; Loss function Soft vs Hard assignment Plate notation Expectation Maximization	Here are D'Souza's notes.
Apr 09 - Apr 15	Expectation Maximization continued. Relationship to VAE, tractable and non tractable functions. Decision trees, Random forest Entropy impurity, Gini impurity Introduction to information theory Entropy, Conditional entropy, Mutual information
Apr 16 - Apr 22	Kullback Leibler divergence Introduction to learning theory VC dimension, Radamacher complexity Markov and Chebyshev inequality
Apr 23 - Apr 29	Chernoff and Hoeffding bounds Wrap up of learning theory MIDTERM II