Fall 2009, CIS 6930, Section 2384 Dimensionality Reduction and Applications (also CIS 4930, Section 2497)

Instructor: Meera Sitharam


Time and Place: Tuesdays 2; Thursday 2-3 in FLI 0117
Office Hours: Tuesdays 9:30-10:30; Also Thursdays 10:30-11:30(by appointment)
Course Contents:
The course will be primarily about the Basic theoretical concepts and classical theory behind dimensionality reduction (see (C) below). We will motivate the course by interspersing this with some applications (see (A) below) and a discussion of connections to other fundamental questions (see (B) below) in mathematics and theoretical computer science.

(A) Applications

Machine Learning and Data mining
Compression, sampling and reconstruction (See also Alireza Entezari's special topics course this semester)
Coding and Cryptography
Approximation algorithms and inapproximability (See also My Tra Thai's special topics course this semester)

(B) Closely Related Fundamental Topics

Complexity theory
Ramsey theory, Probabilistic method in Combinatorics
Random matrices and additive number theory
Geometry of Banach spaces
Harmonic analysis, Approximation theory and Frames
Orthogonal and Symplectic Spreads
Pseudorandomness and Information

(C) Basic concepts and classical theorems (as time permits)

Definitions of Dimension and Definitions of Disortion
Principal Components and Related stuff
Random Projections and the Johnson-Lindenstrauss Lemma
Embedding Metric spaces in Hilbert Spaces - Schoenberg's theorem, semidefinite programming based methods
Finite metric space (non)embeddability - Bourgain's theorem(s)
Random projection, Restricted Isometry Property of Frames
Random Sections and Projections of convex bodies and L^p balls Measure concentration --Dvoretzky's theorem
Very brief sketches of various other Ramsey/extremal-combinatorics-type theorems: Szemeredi's regularity lemma, Roth-Gowers, Besicovitch-Kakeya, Bourgain, Green-Tao, etc.
Miscellaneous: Frames based on Expanders, Group theoretic Frame constructions, Grassmanian packings, Frame Matroids
Hilbert's Nullstellensatz proofs of nonembeddability and semidefinite programming based methods; Other types of ``Duality.''
Some of the material in three previous special topics courses:

(i) Geometric constraints and Metric space embeddings course Part I and Part II (see lecture notes)

(ii) Advances in Complexity course (see lecture notes)

(iii) Geometric Complexity course (see lecture notes - will be up soon)


Course Format and Grading:

--A list of papers/websites will be available by the first week of classes.
--Each student will be expected to give up to three 2hr lecture presentations on the student's choice of appropriate reading material selected from the list after consultation with me. See below on how this will contribute to your grade.
-- After each lecture, one assigned student (different from the one giving the lecture) will be responsible for preparing lecture notes, typing them up, discussing them with me during office hours, and posting them by the end of the week with revisions continuing until the lecture notes are high quality. (Each student should expect to come up at least 2-3 times in this rotation; see below on how this will contribute to your grade).
--All students will be expected to have read through the material before the lecture, and participate during the lecture, since the lectures will proceed at a relatively brisk pace.

Your grade will be based on the following:
--Active class participation (which will show me to what extent you are reading the relevant material, reading to patch up holes in your background, actively going over and keeping up your understanding of the lecture material currently being presented). Discussing offline (study groups) with other students in the class, and communicating the results of your discussions to me and the remainder of the class is highly encouraged.
--Prompt posting of thorough, clear, complete lecture notes on the lecture days assigned to you. Several discussions with me over email and office hours are to be expected. (see course format above)
--Atleast one 2hr lecture presentation on student's choice of material selected from assigned list (see course format above). Students will be expected to put in significant effort on reading, organiziton, consultation with me and delivery of this presentation.
--Each month I will try to give feedback to each student on how you are doing, what you need to work on, etc.


Prerequisites:
Strong background in the Basics of Linear Algebra, Discrete Mathematics, Analysis and Probability. Some further exposure to Graph theory, Combinatorics, modern Algebra and Geometry will be useful. Asymptotic complexity analysis of algorithms, and NP-completeness would also be useful. You will be expected to do any extra reading required to patch-up holes in your background. In general, I would recommend that undergrads and new students to the department check with me first before taking this course.


Reading List (in rough order to be presented):

NOTE: Feel Free to pick an alternative paper instead of any of these. In fact, there are plenty of such candidate papers in the bibliographies of these papers. But consult with me before you pick and once the papers to be presented and dates are decided, we will make another list so people know what to read before class! Also note that some of these papers do not have links but you should be able to find them using the keywords


(A)+(C) above: Motivation and extraction of common problem(s) + Exposure to some of the Basic concepts and classical theorems

---Machine Learning

Comparing PCA (linear methods) to nonlinear methods (manifold learning) for classification

Support vector machines, Kernel methods and dimension reduction

Another such paper

Dimensionality reduction by Feature selection methods from machine learning

---Compression, Streaming etc.

Intro to ``compressive sensing''

Short version of one of the main results needed for compressed sensing

Streaming, Sketching, and sublinear space algorithms

--Approximation and other Algorithms

Algorithmic applications of metric space embeddings

Survey that has a summary chapter at the end with references on algorithmic applications of metric space embeddings

--Geometric and related Algorithms

Nearest Neighbors and Metric Space embeddings (Indyk)

Sensor Network Localization, Distance graph realizations etc (Yin-Yu Ye)


(C) Clarifying Basic concepts already seen above and visiting classical theorems

--Dimension: topological spaces, vector spaces, metric spaces, manifolds, normed vector spaces, Hausdorff/Banach/Hilbert; Other notions of dimension - doubling dimension, VC-dimension
--Kolmogorov information, Lempel-Ziv compression, Universal inference/learning
--Distortion
--PCA and related linear methods, Johnson-Lindenstrauss, Schoenberg/approximate SVD and semidefinite programming based methods, Dvoretzky's theorem


(B)+(C) Connections + More Classical and Recent theorems

Vera Rosta's survey-biblio on recent Ramsey type results (pick appropriate papers from bibliography)

Sign Matrix Rank + Margin Bounds (see papers by Forster in bibliography)

Limitations of Support Vector Machines

Explicit constructions

Impossibility of Dimension reduction in L_1 (Charikar et al)

Frames and Expanders, Frame Matroids

Various dualities for embeddability/nonembeddability - Hilbert's Nullstellensatz



Lecture notes: