CAP 6610, Machine Learning, Spring 2013

Place:CSE Building; E107
Time:MWF 6 (12:50-1:40 p.m.)

Instructor:
Prof. Arunava Banerjee
Office: CSE E336.
E-mail: arunava@cise.ufl.edu.
Phone: 505-1556.
Office hours: Tuesday 2:00 p.m.-4:00 p.m.

TA:
Inchul Choi
Office: CSE E309.
E-mail: xxx@cise.ufl.edu.
Office hours: Monday 3:00 p.m.-5:00 p.m.(at CSE E309) or by appointment.

TA:
Subhajit Sengupta
Office: CSE E309.
E-mail: xxx@cise.ufl.edu.
Office hours: Wednesday 4:00 p.m.-5:00 p.m.(at CSE E309) and 5:00 p.m.-6:00 p.m. (at CSE E404).

Pre-requisites:

Textbook: Machine Learning: A Probabilistic Perspective, Murphy, ISBN-10: 0262018020.

Reference: Pattern Recognition and Machine Learning, Bishop, ISBN 0-38-731073-8.

Reference: Pattern Classification, 2nd Edition, Duda, Hart and Stork, John Wiley, ISBN 0-471-05669-3.

Tentative list of Topics to be covered

The above list is tentative at this juncture and the set of topics we end up covering might change due to class interest and/or time constraints.

Please return to this page at least once a week to check updates in the table below

Evaluation:

The final grade will be on the curve.

Course Policies:

Academic Dishonesty: See http://www.dso.ufl.edu/judicial/honestybrochure.htm for Academic Honesty Guidelines. All academic dishonesty cases will be handled through the University of Florida Honor Court procedures as documented by the office of Student Services, P202 Peabody Hall. You may contact them at 392-1261 for a "Student Judicial Process: Guide for Students" pamphlet.

Students with Disabilities: Students requesting classroom accommodation must first register with the Dean of Students Office. The Dean of Students Office will provide documentation to the student who must then provide this documentation to the Instructor when requesting accommodation.

Announcements

Homework 2 deadline extended to Mar 20th.

Midterm I date announced. Mar 1 (Friday) in Class. Three questions. One letter sized cheat sheat allowed.

Project time line has been set. Following are the due dates for the four reports.

  1. Jan 25th: Description
  2. Feb 15th: Logistics (that is, How you will collect data, what preprocessing, parsing steps you will have to take, etc.)
  3. Mar 29th: Preleminary results (that is, your code should be running by now, report what algorithms you used etc, and report preliminary results)
  4. Apr 19th: Final Report

HomeWorks
HomeWork Due Date Solutions
HomeWork 1
Feb 8th 2013 (In Class) Solutions 1,2 Solutions 3,4 Solutions 4b
HomeWork 2
Mar 13th 2013 (In Class) Deadline extended to Mar 20th
HomeWork 3
Apr 19th 2013 (In Class)

List of Topics covered
Week Topic Additional Reading
Jan 06 - Jan 12
  • Putative framework:
  • Supervised, Unsupervised Learning. Reinforcement Learning
  • Labeled/unlabeled datasets, training/testing.
  • Generalization, over-fitting to training data
Jan 13 - Jan 19
  • Decision Trees
  • Information gain and Gini impurity
The Wiki page on Decision tree learning.
Jan 20 - Jan 26
  • The Risk Functional Approach
  • Demonstration of Risk Functionals for Classification, Regression, and Density Estimation.
  • Empirical Risk Minimization principle
Jan 27 - Feb 02
  • Jensen's inequality
  • Expected risk versus Empirical risk
  • Hoeffding's inequality
  • Bayesian Decision Theory
  • Getting confortable with the n-dimensional Gaussian/Normal distribution.
For technical material that covers what we have been discussing, read Durrett's book's first chapter (and others if you want to learn more).
Feb 03 - Feb 09
  • Whitening transform for Gaussian/Normal Distribution
  • Bayes optimal discriminant for Normally distributed classes.
  • Perceptron Learning rule
  • Energy function for perceptron learning and gradient descent
  • Started mistake bound theorem for perceptron
Feb 10 - Feb 16
  • Finished mistake bound theorem.
  • Multi-layer perceptrons and Error back propagation
Feb 17 - Feb 23
  • Recap of Error backpropagation. On-line learning, epoch, over-fitting etc.
  • Convex functions, Thm: local minima = global minima
  • Convex optimization: Inequality and Equality constraints
  • Primal form of maximal margin classifier
  • Guest Lecture by Rahul Sukthankar from Google Research
Here is a link to the book Convex Optimization by Boyd and Vandenberghe.
Feb 24 - Mar 02
  • Constrained optimization; objective, equality and inequality constraints
  • Lagrange multiplier technique for equality constraints.
  • Convex fns and sets, Affine fns and sets.
  • Midterm I (friday, in class)
Mar 03 - Mar 09
  • Spring Break
Mar 10 - Mar 16
  • Convex optimization problems, the Lagrangian, the Lagrange dual problem.
  • Weak and Strong duality
  • Primal formulation of SVM
  • Dual formulation of SVM
  • Kernel Trick
  • Representer theorem
  • Formulation of SVM with hinge loss.
Mar 17 - Mar 23
  • Generalization error
  • VC Dimension
  • Proof Sketch for VC bound on generalization error.
  • Radamacher Complexity
  • Other popular classifiers: K-Nearest Neighbor, Naive Bayes, Logistic Regression.
Mar 24 - Mar 30
  • Unsupervised learning; Roadmap for rest of semester
  • Maximum likelihood principle (ML), Maximum a posteriori (MAP)
  • Conjugate prior
Mar 31 - Apr 06
  • Parameter estimation: Bernoulli/Multinomial (conjugate prior: Beta/Dirichlet)
  • Gaussian distribution, 1-D case
  • Bias and Variance of estimators, Maximum likelihood estimate of variance is biased.
  • Principal component analysis.
Apr 07 - Apr 13
  • Reconstruction error versus max variance view of PCA.
  • K-Means clustering; objective function and algorithm
  • Mixture of Gaussians and Expectation Maximization.
Here are D'Souza's notes.
Apr 14 - Apr 20
Apr 21 - Apr 27
  • Review
  • Midterm II (Wednesday 24th, in class)