Indexing and Querying of Bioinformatics Data
Finding similarities among bioinformatics data is a difficult
problem. This is because the sizes of bioinformatics databases are
growing rapidly. Also, comparing even a pair of data elements is often
too time consuming. For example, comparing two DNA or protein
sequences involves solving a dynamic programming problem with
quadratic time complexity. The well known heuristics take in the order
of 10 seconds to superimpose one protein structure on top of
another. Computing the Earth Mover's Distance between two bioimages
takes several seconds. Given the complexity of comparing the
similarity between two data elements and the enormous size of the
bioinformatics data, it is obvious that efficient indexing and
querying methods are needed to make this data usable. This project
aims to develop efficient algorithms for comparing a pair of data
elements as well as indexing methods to search large databases with
emphasis on DNA and protein sequences and protein structures.
Software
TRIAL A software for aligning two protein structures.
People
-
Chris Jermaine
-
Tamer Kahveci
-
Jayendra Venkateswaran
Publications
-
Jayendra Venkateswaran, Deepak Lachwani, Tamer Kahveci,
Christopher Jermaine
Reference-Based Indexing for Metric Spaces with Costly Distance Measures,
accepted to the VLDB Journal, 2007.
(SpringerLink)
-
Jayendra Venkateswaran, Deepak Lachwani, Tamer Kahveci,
Christopher Jermaine
Reference-based Indexing of Sequence
Databases,
VLDB, 2006, pages 906-917. (Abstract)
(PDF)
-
Jayendra Venkateswaran, Tamer Kahveci, Orhan Camoglu
Finding Data Broadness Via Generalized Nearest Neighbors,
EDBT, 2006, pages 645-663. (Abstract)
(PDF)
-
Tamer Kahveci, Venkatakrishnan Ramaswamy, Han Tao, Tao Li
Approximate Global Alignment of Sequences,
BIBE, 2005,
pages 81-88. (Abstract)
(PDF)
-
Abhijit Pol, Tamer Kahveci
Highly Scalable and Accurate
Seeds for Subsequence Alignment,
BIBE, 2005, pages
27-31. (Abstract)
(PDF)
-
Tamer Kahveci, Vebjorn Ljosa and Ambuj K. Singh,
Speeding
up Whole Genome Alignment by Indexing Frequency Vectors,
Bioinformatics, 20:13, pages 2122-2134, 2004. (PubMed)
(Bioinformatics)
-
Tamer Kahveci and Ambuj K. Singh,
Progressive Searching
of Biological Sequences,
IEEE Database Engineering
Bulletin, 27:3, pages 32-39, 2004. (PS)
-
Orhan Camoglu, Tamer Kahveci and Ambuj K. Singh,
Towards
Index-based Similarity Search for Protein Structure Databases,
Journal of Bioinformatics and Computational Biology
(JBCB), 2:1, pages 99-126, 2004. (invited paper). (PubMed)
-
Arnab Bhattacharya, Tolga Can, Tamer Kahveci, Ambuj K. Singh,
Yuan-Fang Wang
ProGreSS: Simultaneous Searching of
Protein Databases by Sequence and Structure,
PSB, 2004,
pages 264-275. (PDF)
-
Orhan Camoglu, Tamer Kahveci, Ambuj K. Singh,
Towards
Index-based Similarity Search for Protein Structure
Databases,
CSB, 2003, pages 148-158.(PDF)
-
Orhan Camoglu, Tamer Kahveci, Ambuj K. Singh,
PSI:
Indexing Protein Structures for Fast Similarity Search,
ISMB, 2003, pages 81-83 (also in Bioinformatics journal). (PDF)
-
Tamer Kahveci, Christian Lang, Ambuj K. Singh,
Joining
Massive High-Dimensional Databases,
ICDE, 2003, pages
264-276. (Abstract)
(PDF)
(PPT)
(TR)
-
Tamer Kahveci, Ambuj K. Singh,
MAP: Searching Large
Genome Databases,
PSB, 2003, pages 303-314. (Abstract)
(PDF)
-
Tamer Kahveci, Ambuj K. Singh, and Aliekber Gurel,
Similarity Searching for Multi-Attribute Sequences,
SSDBM, 2002. (Abstract)
(PDF)
-
Tamer Kahveci and Ambuj K. Singh,
An Efficient Index
Structure for String Databases,
VLDB, 2001, pages
351-360. (Abstract)
(PDF)
(PPT)
(TR)
-
Tamer Kahveci and Ambuj K. Singh,
Variable Length Queries
for Time Series Data,
ICDE, 2001, pages 273-282. (Abstract)
(PDF)
(TR)
Tamer Kahveci
Last modified: Sat Oct 6 07:24:39 EDT 2007