Fall 2006 Database Seminar

Wednesday September 6th, 2006
CSE Room 404
1:30 - 2:30 PM

Sampling Techniques for Scalable Approximate Query Processing

Abhijit Pol

The speed and ability of data management systems for analyzing data is increasingly important compared to the requirements of traditional transaction processing (TP) applications. One promising way to address this new requirement is to rely on approximate query processing (AQP). Despite the breadth and depth of the seminal research in AQP, much work remains if AQP based systems are to become practical and widely-used. We believe that today’s TP systems need to be rebuilt from scratch to support AQP effectively and efficiently. This is a monumental task. We therefore propose to work on following two specific problems. We first propose to develop a new file organization, a Geometric File, to efficiently maintain very large sample views (a one-pass disk-based, sample of large database relations). The geometric file can be a part of the index/file/record manager and used by the OLA query execution engine component. Second, we propose redesigning the query execution engine, and focus on problem of supporting subset-based queries (queries involving correlated, nested subqueries). We will design and implement online, sampling-based algorithms for these types of queries.