Fall 2006 Database Seminar

Wednesday September 27th, 2006
CSE Room 305
12:00 - 1:00 PM

What does database bootstrapping mimic?

Dr. Eva Czabarka

Database bootstrapping is regularly used by bioinformaticians to assign statistical significance to the measured difference between the information retrieval capabilities of their algorithms. .Statisticians, however, frown upon this use of bootstrapping, as there is no model for what distribution bootstrapping is supposed to mimic.

In my talk I will describe a simple urn model to resolve this issue and present central limit theorems that justify the use of bootstrapping for assigning statistical significance to a particular measure, the ROC[n] (receiver operator characteristic value truncated at n). These theorems provide simple-to-use formulas for computing these values; in simulation studies the approximate p-values thus computed were precise enough to eliminate the need for the computationally expensive code for database bootstrapping. This is joint work with John L. Spouge.

 


For upcoming talks, visit http://www.cise.ufl.edu/dbcenter/seminar.shtml