CAP 6610, Machine Learning, Spring 2010

Place:CSE Building; E107
Time:MWF 4 (10:40-11:30 a.m.)

Instructor:
Prof. Arunava Banerjee
Office: CSE E336.
E-mail: arunava@cise.ufl.edu.
Phone: 392-1476.
Office hours: Wednesday 2:00 p.m.-4:00 p.m. or by appointment.

TA:
Venkatakrishnan Ramaswamy
Office: CSE E445.
E-mail: vr1@cise.ufl.edu.
Office hours: Monday 3:00 p.m.-5:00 p.m.(at CSE E309) or by appointment.

TA:
Subhajit Sengupta
Office: CSE E445.
E-mail: ss5@cise.ufl.edu.
Office hours: Wednesday 4:00 p.m.-6:00 p.m.(at CSE E309) or by appointment.

Pre-requisites:

Textbook: Pattern Recognition and Machine Learning, Bishop, ISBN 0-38-731073-8.

Reference: Pattern Classification, 2nd Edition, Duda, Hart and Stork, John Wiley, ISBN 0-471-05669-3.

Tentative list of Topics to be covered

The above list is tentative at this juncture and the set of topics we end up covering might change due to class interest and/or time constraints.

Please return to this page at least once a week to check updates in the table below

Evaluation:

The final grade will be on the curve.

Course Policies:

Academic Dishonesty: See http://www.dso.ufl.edu/judicial/honestybrochure.htm for Academic Honesty Guidelines. All academic dishonesty cases will be handled through the University of Florida Honor Court procedures as documented by the office of Student Services, P202 Peabody Hall. You may contact them at 392-1261 for a "Student Judicial Process: Guide for Students" pamphlet.

Students with Disabilities: Students requesting classroom accommodation must first register with the Dean of Students Office. The Dean of Students Office will provide documentation to the student who must then provide this documentation to the Instructor when requesting accommodation.

Announcements

Midterm II date and time set. The exam will take place in class (60 mins). You are allowed one letter sized cheat sheat (both sides). Topics cover everything starting from (and including) SVMs.

HW3 is up(due date is 19th April).

HomeWorks
HomeWork Due Date
HomeWork 1
Solutions and scores on WebCT
Homeworks may be picked up in TA office hours.
Jan 27th 2010. Question 3 due on Feb 2nd (5:00 p.m. in E445).
HomeWork 2
Mar 19th 2010.(extended to Mar 24th) Solutions
HomeWork 3
April 19th 2010. Solutions

List of Topics covered
Week Topic
Jan 03 - Jan 09
  • Preliminaries
  • Integers, Rationals
  • Cauchy convergent sequences and Reals
  • Putative framework
Jan 10 - Jan 16
  • Supervised, Unsupervised Learning. Reinforcement Learning
  • The Risk Functional Approach
  • Demonstration of Risk Functionals for Classification, Regression, and Density Estimation.
Jan 17 - Jan 23
  • Empirical Risk Minimization principle
  • Measurable Space, Probability Space, Sigma algebras and such
  • Limit supremum and Limit infimum
Jan 24 - Jan 30
  • Random variables, Covergence in probability, Almost sure convergence
  • Markov's inequality, Chebyshev's inequality.
  • For material that covers what we have been discussing, read Durrett's book's first chapter (and others if you want to learn more).
Jan 31 - Feb 6
  • Weak law of large numbers
  • Chernoff Hoeffding bounds
  • Generalization error bound for finite hypothesis space
  • And here is Carlos Rodriguez's notes on laws of large numbers.
Feb 7 - Feb 13
  • Prof. Rangarajan's guest lecture on Hilbert spaces
  • Bayes theorem, Decision theory, Maximum likelihood (ML) estimate, Maximum aposteriori (MAP) estimate
  • Central limit theorem and Multi-variate Normal distribution
Feb 14 - Feb 20
  • The class of discriminants for a multi-variate normal generative model
  • Linear discriminants and the perceptron learning rule.
  • Shattering, VC-dimension, margin etc. Here is the paper that proves the VC-dimension for given margin/diameter.
  • VC bound on generalization error (statement w/o proof)
  • For those interested in the proof, here are two very nice lectures by Robert Nowak: lecture18 and lecture19
  • Support Vector Machines (Pass 1: Conceptual): Margin maximization, the constrained optimization problem, inner product based formulation, kernels.
Feb 21 - Feb 27
  • Constrained optimization; objective, equality and inequality constraints
  • Lagrange multiplier technique for equality constraints.
  • Convex fns and sets, Affine fns and sets.
  • Midterm I (friday, in class)
Feb 28 - Mar 6
  • Convex optimization problems, the Lagrangian, the Lagrange dual problem.
  • Weak and Strong duality, Constraint qualification (particularly Slater's criterion)
  • Check out Boyd and Vanderberghe's book.
  • The Dual formulation of SVM
  • Maximum likelihood and Bayesian parameter estimation
  • Conjugate priors, Bernoulli and it conjugate (Beta)
Mar 7 - Mar 13
    Spring Break
Mar 14 - Mar 20
  • Parameter estimation: Multinomial (conjugate prior: Dirichlet)
  • Gaussian distribution, 1-D case
  • Bias of estimator, Maximum likelihood estimate of variance is biased.
Mar 21 - Mar 27
  • Maximum Likelihood Estimate for Multi-dimensional Gaussian distribution: Estimates of mean and variance
  • K-Means clustering
  • Mixture of Gaussians and Expectation Maximization. Here are D'Souza's notes.
Mar 28 - Apr 3
  • Finished EM (the algorithm)
  • Theoretical underpinnings of EM.
  • Introduction to Information theory
  • Decision Trees
Apr 4 - Apr 10
  • Principal component analysis.
  • Independent component analysis.
Apr 11 - Apr 17
  • Hidden Markov models
  • Evaluation problem and decoding (Viterbi)
  • Learning problem (Baums-Welch)
Apr 18 - Apr 24
    Midterm II (Wednesday 21st, in class)