Approximate Matching in Seqeunce Databases using Distance-based Indexing JAYENDRA VENKATESWARAN We consider the problem of similarity search in sequence databases with edit distance as the similarity measure. The dynamic programming solution to the problem of finding the best alignment between two sequences of lengths m and n has O(m.n) time and space complexity. In this paper, we consider the problem of reducing the number of string comparisons during database searches. We select a small fraction of the dataset referred to as 'reference objects'. The distances of strings in the database with the reference objects are pre-computed. This is an one-time cost for the database. Given a query sequence, the search algorithm first computes its distance from the reference objects. Then the pre-computed distances are used to prune the strings that are not in the resultant set without false dismissals. We propose two heuristics in selection of the reference objects. In addition to representing the part of the database containing similar database strings, they increase the pruning of distance calculations.