Wednesday September 27th, 2006
CSE Room 305
12:00 - 1:00 PM
|
What
does database bootstrapping mimic? |
|
Dr. Eva Czabarka |
|
Database
bootstrapping is regularly used by bioinformaticians
to assign statistical significance to the measured difference between the
information retrieval capabilities of their algorithms. .Statisticians, however, frown upon this use
of bootstrapping, as there is no model for what distribution bootstrapping is
supposed to mimic. In
my talk I will describe a simple urn model to resolve this issue and present
central limit theorems that justify the use of bootstrapping for assigning
statistical significance to a particular measure, the ROC[n] (receiver
operator characteristic value truncated at n). These theorems provide
simple-to-use formulas for computing these values; in simulation studies the
approximate p-values thus computed were precise enough to eliminate the need
for the computationally expensive code for database bootstrapping. This is
joint work with John L. Spouge. |
For
upcoming talks, visit http://www.cise.ufl.edu/dbcenter/seminar.shtml