Indexing and Querying of Bioinformatics Data

Finding similarities among bioinformatics data is a difficult problem. This is because the sizes of bioinformatics databases are growing rapidly. Also, comparing even a pair of data elements is often too time consuming. For example, comparing two DNA or protein sequences involves solving a dynamic programming problem with quadratic time complexity. The well known heuristics take in the order of 10 seconds to superimpose one protein structure on top of another. Computing the Earth Mover's Distance between two bioimages takes several seconds. Given the complexity of comparing the similarity between two data elements and the enormous size of the bioinformatics data, it is obvious that efficient indexing and querying methods are needed to make this data usable. This project aims to develop efficient algorithms for comparing a pair of data elements as well as indexing methods to search large databases with emphasis on DNA and protein sequences and protein structures.

Software

TRIAL A software for aligning two protein structures.

People


Publications


Tamer Kahveci
Last modified: Sat Oct 6 07:24:39 EDT 2007