Guest Lecturer: Kejun Huang, Ph.D.

Guest Lecturer: Kejun Huang, Ph.D.

Date: March 27, 2018
Time: 10:40 AM - 12:00 PM
Location: 432 Newell Drive, Gainesville, Florida, 32611
Host: UF CISE Department
Admission: This event is free and open to the public.

Latent Variable Identification Using Identifiable Matrix Factorization Methods

Abstract: Latent variable identification is a unifying problem formulation technique for unsupervised machine learning and big data analytics. Interesting applications include topic modeling, community detection and hyperspectral unmixing, to name just a few. Identifiability arises as a fundamental issue since it amounts to answering whether the latent structure can truly be learned without the help of labeled data. Among many approaches that have identifiability guarantees, this talk focuses on nonnegative matrix factorization (NMF)-type methods. NMF is widely and successfully used in many applications, but a theoretical understanding on why it is able to identify latent variables used to be very limited.

The take-home point of this talk is that a latent variable can be uniquely identified if it is sufficiently scattered, an assumption inspired by convex geometry, using either a plain NMF model or in addition with a “volume” regularization. This principle is demonstrated in the application of hidden Markov model (HMM) identification, which shows that a HMM can be uniquely identified from the pairwise co-occurrence probability of consecutive observations if the emission probability is sufficiently scattered. This is the first method that guarantees identifiability of a HMM from pairwise co-occurrences, which is particularly suitable for applications where the possible outcomes of the observations is relatively large, for example in topic modeling.

We show that we can learn topics with higher quality if documents are modeled as observations of HMMs sharing the same emission. topic) probability, compared to the simple but widely used bag-of-words model.

Biography: Kejun Huang received his Ph.D. degree in Electrical Engineering from the University of Minnesota, Minneapolis, MN, USA in 2016. He is currently a Postdoctoral Associate at the department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA. His research interests include signal processing, machine learning, big data analytics, and optimization, with special focus on identifiability analysis and non-convex algorithm design for latent variable models.