Wednesday November 29th, 2006
CSE Room 305
12:00 - 1:00 PM
|
Conditional
Anomaly Detection |
|
Xiuyao Song |
|
When
anomaly detection software is used as a data analysis tool, finding the
hardest-to-detect anomalies is not the most critical task. Rather, it is
often more important to make sure that those anomalies that are reported to
the user are in fact interesting. If too many unremarkable data points are returned
to the user labeled as candidate anomalies, the software will soon fall into
disuse. One
way to ensure that returned anomalies are useful is to make use of domain
knowledge provided by the user. Often, the data in question include a set of
environmental attributes whose values a user would never consider to be
directly indicative of an anomaly. However, such attributes cannot be ignored
because they have a direct effect on the expected distribution of the result
attributes whose values can indicate an anomalous observation. This paper
describes a general-purpose method called conditional anomaly detection for
taking such differences among attributes into account, and proposes three
different expectation-maximization algorithms for learning the model that is
used in conditional anomaly detection. Experiments over 13 different data
sets compare our algorithms with several other more standard methods for
outlier or anomaly detection. |
For
upcoming talks, visit http://www.cise.ufl.edu/dbcenter/seminar.shtml