Research Reference Collection
Links: AQP and Online Aggregation | Indexing Large Databases | Data Cleaning
Query evaluation:
- Optimizing Queries with Aggregate Views.
- Optimizing nested queries with parameter sort orders.
- Orthogonal optimization of subqueries and aggregation.
- Optimization of nested SQL queries revisited.
- Executing Nested Queries.
- Query evaluation techniques for large databases.
- Access Path Selection in a Relational Database Management System.
- Including Group-By in Query Optimization.
- Of Nests and Trees: A Unified Approach to Processing Queries That Contain Nested Subqueries, Aggregates, and Quantifiers.
- Improved Unnesting Algorithms for Join Aggregate SQL Queries.
- Performing Group-By before Join.
- Eager Aggregation and Lazy Aggregation.
Approximate Query Processing and Online Aggregation:
- Online Aggregation.
- Ripple Joins for Online Aggregation.
- Large-Sample and Deterministic Confidence Intervals for Online Aggregation.
- A scalable hash ripple join algorithm.
- A Disk-Based Join With Probabilistic Guarantees.
- Online Estimation for Subset-Based SQL Queries.
- Towards Estimation Error Guarantees for Distinct Values.
- The Aqua Approximate Query Answering System.
- Sampling-Based Estimation of the Number of Distinct Values of an Attribute.
- Fixed-Precision Estimation of Join Selectivity.
- Statistical Estimators for Aggregate Relational Algebra Queries.
- Practical Selectivity Estimation through Adaptive Sampling.
- Relational Confidence Bounds Are Easy With the Bootstrap.
Indexing Large Databases:
One dimensional Spatial indexing: R-Trees, R+-Trees, R*-Trees, STR Packing
- Extendible hashing - A Fast Access Method for Dynamic Files.
- R-trees: A Dynamic Index Structure for Spatial Searching.
- The R+-Tree: A Dynamic Index For Multi-Dimensional Objects.
- The R*-tree: An Efficient and Robust Access Method for Points and Rectangles.
- STR algorithm: The packing and bulkloading of R-Trees
- R-Tree packing using the Hilbert.
Very high dimensional indexing: X-Tree, Pyramid technique, Indexability
- The X-Tree: An Index Structure for High-Dimensional Data.
- The Pyramid Technique: Towards Breaking the Curse of Dimensionality.
- On a Model of Indexability and Its Bounds for Range Queries.
Nearest neighbor search and Similarity search via indexing, distance and spatial joins via indexing
- Nearest Neighbor Queries
- An Approximation-Based Data Structure for Similarity Search.
- Continuous Near Neighbor Search.
- Efficient Processing of Spatial Joins Using R-Trees.
- Incremental Distance Join Algorithms for Spatial Databases.
- Interval Skip List.
Temporal and interval indexing
- R-Tree Based Indexing of Now-Relative Bitemporal Data.
- The MV3R-Tree: A Spatio-Temporal Access Method for Timestamp and Interval Queries.
Indexing moving objects
Speeding up index inserts: LSM-Tree, Y-Tree, Buffer Tree, Stepped Merge Method
Indexing strings: String B-Tree, multidimensional string indexing
Indexing time series and sequences
- Fast Subsequence Matching in Time-Series Databases
- Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases
Indexing for applications in the biological sciences
Indexing XML documents and regular expressions
- Querying XML Data for Regular Path Expressions.
- A Fast Regular Expression Indexing Engine.
- XRANK: Ranked Keyword Search over XML Documents.
- Twig joins.
- A Fast Index for Semistructured Data.
Data Cleaning:
- The merge/purge problem for large databases
- TAILOR: A Record Linkage Toolbox
- Record Linkage Techniques
- A Theory for Record Linkage.
- Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem (1998)
- Record Linkage: Current Practice and Future Directions.
- Data Cleaning: Problems and Current Approaches
- Improving Data Cleaning Quality Using a Data Lineage Facility
- Declarative Data Cleaning: Language, Model, and Algorithms
- An Extensible Framework for Data Cleaning
- Robust and Efficient Fuzzy Match for Online Data Cleaning.
- Potter's Wheel: An Interactive Data Cleaning System
- Data Quality Mining -Making a Virtue of Necessity
- A Framework for Analysis of Data Quality Research
- Systematic Development of Data Mining-Based Data Quality Tools
- Schema Mapping as Query Discovery
- Real World Data is Dirty: Data Cleansing and The Merge/Purge Problem
- Data Cleansing: Beyond Integrity Analysis
- Cleansing Data for Mining and Warehousing
- ARKTOS: A Tool for Data Cleaning and Transformation in Data Warehouse Environments