Abstract

Multiple protein sequence alignment is a basic step in the analysis of biological data .It has applications in phylogenetic tree estimation, structure predication and critical residue identification. There exists various techniques and tools for Multiple Sequence Alignment (MSA) each with it's own merits and demerits. In this project we plan to analyse three of the most efficient algorithms and compare their relative performance and deduce results.

Team Members

Muralidhar Sathsahayaraman
Koushik Rajagopal
Sarvesh Sakalanaga

What are we going to implement?


1. Progressive Alignment construction
2. Iterative Methods
3. Segment-based Methods

We plan on analyzing the asymptotic complexity of a few algorithms which are based on these basic approaches and compare the performance of the algorithms.

What methods are we going to use?

1. KALIGN
2. T-COFFEE
3. MUSCLE
4. Dialign

What are we measuring ?

1. Accuraccy
2. Speed

Datasets:

Ballibase 3.0 reference files

Prefab 4.0 reference files

Workload distribution:

1. Analysis of T-Coffee (Koushik)
2. Analysis of KALIGN (Muralidhar)
3. Analysis of MUSCLE (Sarvesh)
4. Analysis of Dialign, Comparative study and Tools integration(All)

References:

Notredame C, et al. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol, ( (2000) )302, : 205–217.

Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res, ( (2004) ) 32, : 1792–1797

Timo Lassmann and Erik LL Sonnhammer (2005) Kalign - an accurate and fast multiple sequence alignment algorithm BMC Bioinformatics 2005.

BAliBASE 3.0: Latest Developments of the Multiple Sequence Alignment Benchmark Julie D. Thompson,1* Patrice Koehl,2 Raymond Ripp,1 and Olivier Poch1

DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment Amarendran R Subramanian1, Jan Weyer-Menkhoff2 , Michael Kaufmann1 email and Burkhard Morgenstern2