Fall 2006 Database Seminar

Wednesday October 25th, 2006
CSE Room 305
12:00 - 1:00 PM

A Novel Algorithm For Identifying Low-Complexity Regions In A Protein Sequence

Xuehui Li

We consider the problem of identifying low-complexity regions (LCRs)
in a single protein sequence. LCRs are biased composition, normally
consisting of different kinds of repeats.  We define a new complexity
function to measure the complexity of a subsequence based on a given
scoring matrix, such as BLOSUM 62.  We develop a novel graph-based
method to mine protein sequences. Our method finds small intervals as
LCR candidates by traversing this graph. It then extends them to find
longer intervals with lower complexity. Consecutive intervals are then
compared to find interspersed repeats. Our experiments on real data
show that our method has significantly higher recall and precision
compared to existing methods, including 0j.py, CARD and SEG.


For upcoming talks, visit http://www.cise.ufl.edu/dbcenter/seminar.shtml