Abstract
Multiple protein sequence alignment is a basic step in the analysis of biological data .It has applications in phylogenetic tree estimation, structure predication and critical residue identification. There exists various techniques and tools for Multiple Sequence Alignment (MSA) each with it's own merits and demerits. In this project we plan to analyse three of the most efficient algorithms and compare their relative performance and deduce results.
Team Members
Muralidhar Sathsahayaraman
Koushik Rajagopal
Sarvesh Sakalanaga
What are we going to implement?
1. Progressive Alignment construction
2. Iterative Methods
3. Segment-based Methods
We plan on analyzing the asymptotic complexity of a few algorithms which are based on these basic approaches and compare the performance of the algorithms.
What methods are we going to use?
1. KALIGN
2. T-COFFEE
3. MUSCLE
4. Dialign
What are we measuring ?
1. Accuraccy
2. Speed
Datasets:
Ballibase 3.0 reference files
Prefab 4.0 reference files
Workload distribution:
1. Analysis of T-Coffee (Koushik)
2. Analysis of KALIGN (Muralidhar)
3. Analysis of MUSCLE (Sarvesh)
4. Analysis of Dialign, Comparative study and Tools integration(All)
References:
Notredame C, et al. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol, ( (2000) )302, : 205–217.
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res, ( (2004) ) 32, : 1792–1797
Timo Lassmann and Erik LL Sonnhammer (2005) Kalign - an accurate and fast multiple sequence alignment algorithm BMC Bioinformatics 2005.
BAliBASE 3.0: Latest Developments of the Multiple Sequence Alignment Benchmark Julie D. Thompson,1* Patrice Koehl,2 Raymond Ripp,1 and Olivier Poch1
DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment Amarendran R Subramanian1, Jan Weyer-Menkhoff2 , Michael Kaufmann1 email and Burkhard Morgenstern2