Wednesday September 6th, 2006
CSE Room 404
1:30 - 2:30 PM
|
Sampling
Techniques for Scalable Approximate Query Processing |
|
Abhijit Pol |
|
The
speed and ability of data management systems for analyzing data is increasingly
important compared to the requirements of traditional transaction processing (TP) applications. One promising way to
address this new requirement is to rely on approximate query processing (AQP). Despite the breadth and depth
of the seminal research in AQP, much work remains if AQP based systems are to
become practical and widely-used. We believe that today’s TP systems need to
be rebuilt from scratch to support AQP effectively and efficiently. This is a
monumental task. We therefore propose to work on following two specific
problems. We first propose to develop a new file organization, a Geometric File, to efficiently
maintain very large sample views (a one-pass disk-based, sample of large
database relations). The geometric file can be a part of the index/file/record
manager and used by the OLA query execution engine component. Second, we
propose redesigning the query execution engine, and focus on problem of
supporting subset-based queries
(queries involving correlated, nested subqueries). We will design and implement
online, sampling-based algorithms for these types of queries. |