Approximate Global Alignment of Sequences

We propose two novel dynamic programming (DP) methods that solve the the approximate bounded and unbounded global alignment problems for biological sequences. Our first method solves the bounded alignment problem. It computes the distribution of the edit distance between the remaining suffixes. For a given bound k and approximation p %, it uses this distribution to prune the entries of the DP matrix that will lead to alignments with more than k edit operations with more than p%probability. Our second method addresses the unbounded global alignment problem. For each entry of the distance matrix, it dynamically computes an upper bound to the distance between the unaligned suffixes. This bound, along with the lower bound as computed for the bounded case, is then used to eliminate the entries of the distance matrix. According to our experimental results, our methods are up to three times faster than the competing methods for the bounded alignment and up to two times faster for the unbounded alignment, even with 100% approximation. Our methods use only 17-68% of the space used by the next best competitor.