Spring 2008 Database Seminar
Thursday Feb. 14th, 2008
CSE Room 404
12:00 - 1:00pm
Learning correlations using Mixture-Of-Subsets model
Manas Somaiya
Using a mixture of random variables to model data is a tried-and-tested
method common in data mining, machine learning, and statistics. By using
mixture modeling it is often possible to accurately model even complex,
multi-modal data via very simple components. However, the classical
mixture model assumes that a data point is generated by a single
component in the model. A lot of datasets can be modeled closer to
underlying reality if we drop this restriction. We propose a
probabilistic framework - Mixture-Of-Subsets (MOS) Model by making two
fundamental changes to the classical mixture model. First, we allow a
data point to be generated by a set of components rather than just a
single component. Next, we limit the number of data attributes that each
component can influence. We also propose an EM framework to learn the
MOS model from a dataset, and experimentally evaluate it on real,
high-dimensional datasets. Our results show that the MOS model learned
from the data represents the underlying nature of the data accurately.
For upcoming talks, visit http://www.cise.ufl.edu/dbcenter/seminar.shtml.