University of Florida :: Department of Computer and Information Science and Engineering (CISE)

News & Events

Jermaine, Arumugam, Pol and Dobra win SIGMOD 2007 Best Paper Award

October 18, 2007

CISE professors Chris Jermaine (pictured at center), Alin Dobra (right), CISE grad student Subi Arumugam (left) and CISE PhD alum Abhijit Pol (not pictured; now at Yahoo!) were co-recipients of the SIGMOD 2007 best paper award for their research article entitled "Scalable Approximate Query Processing with the DBO Engine". SIGMOD is one of the most selective and widely read venues for publication in the database research area. SIGMOD receives hundreds of research submissions each year, and typically less than 15% of the articles are selected for presentation at the conference.

Their paper describes the query processing engine of a prototype database engine being developed at UF, called DBO or Database-Online. The goal of the National Science Foundation-sponsored DBO project is to build a database engine that can answer analytic or statistical queries over terabyte-sized data archives just as fast as any popular commercial or public license database engine such as Oracle or Postgres. The key benefit of using DBO is that not only does DBO compute exact answers quickly, but it also uses statistical methods to always provide the user with a guess as to what the final answer to the query will be, even very early during query execution. For example, imagine that a user wishes to compute the total sales of a certain product by a company, broken down by the company's various divisions. After a very short time, DBO may report a current estimate of $8.450 million for one of the divisions, with a 95% chance that the true value is between $8.410 million and $8.490 million. Since DBO may be able to provide this estimate after only a few minutes of query processing when running the query to completion may take hours, it can result in a huge time savings for the user in the case where having two digits of accuracy is enough. As Jermaine says, "The data stored in a data warehouse are typically riddled with errors due to the data collection, integration, and cleaning process. So it probably does not make any sense to spend hours trying to compute a few extra decimal points of an answer that cannot be trusted past the first few digits anyway!"

Feedback