Spring 2008 Database Seminar

Thursday Feb. 14th, 2008
CSE Room 404
12:00 - 1:00pm


Learning correlations using Mixture-Of-Subsets model
Manas Somaiya


Using a mixture of random variables to model data is a tried-and-tested method common in data mining, machine learning, and statistics. By using mixture modeling it is often possible to accurately model even complex, multi-modal data via very simple components. However, the classical mixture model assumes that a data point is generated by a single component in the model. A lot of datasets can be modeled closer to underlying reality if we drop this restriction. We propose a probabilistic framework - Mixture-Of-Subsets (MOS) Model by making two fundamental changes to the classical mixture model. First, we allow a data point to be generated by a set of components rather than just a single component. Next, we limit the number of data attributes that each component can influence. We also propose an EM framework to learn the MOS model from a dataset, and experimentally evaluate it on real, high-dimensional datasets. Our results show that the MOS model learned from the data represents the underlying nature of the data accurately.


For upcoming talks, visit http://www.cise.ufl.edu/dbcenter/seminar.shtml.