Bayesian Inference using Neural Net Liklihood Models for Protein Secondary Structure Prediction
Seon Kim
UFID: 3069-2580
Email: skim22@ufl.edu
Abstract
Predicting alpha-helicies, beta-sheets and turns of a proteins secondary structure is a complex non-linear task that has been approached by several techniques such as Neural Networks, Genetic Algorithms, Decision Trees and other statistical or heuristic methods. This project will aim at combining a Bayesian Inference method with offline trained MultiLayered Perceptron (MLP) models as the liklihood for secondary structure prediction. Starting with individual amino acids in a protein sequence, the Bayesian Inference process will gradually take more neighboring amino acid information after each iteration until the posterrior probability of the secondary structure converges. The project will also investigate the usage of multiple MLP models for the liklihood parameters, each using different set of information inputs.
Method
Bayesian Inference is a powerful method that can dynamically estimate probability as more information becomes available. The project will exploit this aspect as more neighboring secondary structure information becomes available for each amino acid after every iteration. A standard MLP Neural Networks will serve as the liklihood of the Bayesian process. The feedback algorithm is shown below.

First, several models of MLPs will be trained using different sets of inputs data mined from the Protein Data Bank (http://www.rcsb.org/pdb/) with varying degrees of neighboring information. The main goal will be to experiment with and find input sets and models with the lowest error rate. These models will be used as the liklihood probability for the Baysian feedback loop. After several iterations, the algorithm will return a converged posterrior probability of whether each amino acid is part of a alpha-helix, beta-sheet or turn, and will be classified as the one with the highest probability.
The accuracy rate as well as computation time will be compared against using a single Neural Network (Baldi, Brunak, 'Bioinformatics: The Machine Learning Approach' chapter 6) and the Chou-Fasman algorithm (Homework 3). The goal will be to increase accuracy with the slight reduction in computation speed.
Workload Distribution and Plan
1. MLP inplementation
2. Bayesian Inference implementation
3. Data gathering and MLP training (finding the best input sets)
4. Experiement with different MLP models in Bayesian inference
5. Single Neural Net approach implementation
6. Chou-Fasman Algorithm implementation
All done by me. The bulk of the work will be done in 3 and 4, the rest is mostly already done.
List of papers
H. Bohr, J. Bohr, S. Brunak, R. M. J. Cotterill, B. Lautrup, L. Nøv, O. H. Olsen,aand S. B. Petersen. Protein secondary structures and homology by neural networks: The a-helices in rhodopsin. FEBS Letters, 241:223.228, 1988.
L. H. Holley and M. Karplus. Protein secondary structure prediction with a neural network. Proc. Nat. Acad. Sci. USA, 86:152.156, 1989.
N. Qian and T. J. Sejnowski. Predicting the secondary structure of globular proteins using neural network models. J. Mol. Biol., 202:865.884, 1988.
T. J. Sejnowski and C. R. Rosenberg. Parallel networks that learn to pronounce English text. Complex Syst., 1:145.168, 1987.
G. D. Stormo, T. D. Schneider, L. Gold, and A. Ehrenfeucht. Use of the perceptronalgorithm to distinguish translational initiation sites in e. coli. Nucl. Acids Res., 10:2997.3011, 1982.
S. Akkaladevi, A. K. Katangur, Protein Secondary Structure Prediction using Bayesian Inference method on Decision fusion algorithms, ipdps, pp.240, 2007 IEEE International Parallel and Distributed Processing Symposium, 2007.
S. C. Schmidler, J. S. Liu, D. L. Brutlag, Bayesian Segmentation of Protein Secondary Structure, Journal of Computational Biology, Volume 7, Numbers 1/2, Pp. 233.248, 2000
(More to be added as experiments are done)