Bin Song


Education

Research Interests

Databases, Bioinformatics, with a focus on efficient computational methods for querying large biological network data sets, compound selection from large compound databases and extracting the useful information from large biological databases.

Research Projects

1. Enzyme identification using metabolic network database
Metabolic networks have significant information for drug discovery. These networks consist of a large number of enzymes, reactions and compounds. There are many public known metabolic network databases. It is an important problem to query these networks and identify target enzymes that can have a desired effect on network. Given a desired influence of the network, we develop computational tools that will identify a subset of enzymes whose inhibitions lead to the desired influence.

There are three models to simulate the metabolic networks if we consider them in steady state. For the Boolean models, we provide the methods in paper [3][4]. For the linear model, we design algorithms in paper [7]. For the non-linear model, we present approaches in paper [1]. If we consider the metabolic network in transient state, we discuss the methods in paper [8].

2. Compound selection from large compound database
Once we identify which enzyme set should inhibit, the next step is to select chemical compounds (i.e. drugs) to alter the activity of these enzymes. There exist millions of compounds in the public database. Therefore, it has a significant application to select some compounds efficiently for a special enzyme. We develop two novel computational methods that rank a given set of compounds for a given target protein or enzymes in a large compound library [6].

3. Domain Detection for protein sequence databases
Biologists frequently align multiple biological sequences to determine consensus sequences and/or search for predominant residues and conserved regions. Particularly, determining conserved regions in an alignment is one of the most important activities. Since protein sequences are often several-hundred residues or longer, it is difficult to distinguish biologically important conserved regions (motifs or domains) from others. Thus a computational tool that can highlight biologically important regions accurately will be highly desired [5].

4. Finding distant structural similarities in protein database
Structural similarities in distantly related proteins are the significant information in the protein data sets. For example, they can reveal functional relationships that can not be identified using sequence comparison. We provide an algorithm for computing the transformation of a protein to align another protein. Our experiments show that our method outperforms existing methods [2].

5. Web server for maize kernel composistion
Grain composition and yield are two important targets for improving food security and reducing the environmental impact of agriculture. Biologists collect a large number of maize seed weights and near infrared reflectance (NIR) spectra data for individual corn kernels. We build web server for analysis of complex data sets such as NIR spectra.

Skills

1. Familiar with C++, C, Matlab, PHP, SQL, MySQL, Oracle, Cplex library, MPI
2. Strong skills in developing and implementing efficient algorithms for large biological data sets.
3. Knowledge of Database management system (DBMS)
4. Knowledge of biological databases, e.g., KEGG, Protein structure databases
5. Knowledge of biological software, e.g., Glide, DOCK, ClustalW

Working experience

1. Research Assistant, University of Florida. August 2007 ¨C present
2. Web-server management August 2007 - present
3. Teaching Assistant, University of Florida. August 2006 ¨C May 2009
Courses: Problem Solving Using Computer Software (3 times), Data Structure and Algorithm

Selected Publications

[1] Bin Song, I. Esra Buyuktahtakin, Sanjay Ranka, Tamer Kahveci, Manipulating the steady state of metabolic pathways, IEEE/ACM Transactions on Computational Biology and Bioinformatics (IEEE TCBB), accepted for publication (Web)
[2] Jayendra Venkateswaran, Bin Song, Tamer Kahveci, Christopher Jermaine, TRIAL: A Tool for Finding Distant Structural Similarities, IEEE/ACM Transactions on Computational Biology and Bioinformatics (IEEE TCBB), accepted for publication.
[3] Bin Song, Padmavati Sridhar, Tamer Kahveci and Sanjay Ranka, Double Iterative Optimization for Metabolic Network-Based Drug Target Identification, International Journal of Data Mining and Bioinformatics, 3(2):145-159, 2009 (abstract).
[4] Padmavati Sridhar, Bin Song, Tamer Kahveci and Sanjay Ranka, Mining metabolic networks for optimal drug targets, Pacific Symposium on Biocomputing (PSB), 13: 291-302, 2008 (abstract) (pdf)
[5] Bin Song, Jeong-Hyeon Choi, Guangyu Chen, et al, ARCS: an aggregated related column scoring scheme for aligned sequences, Bioinformatics, 1(22): 2326-2332, 2006 (pdf)
[6] Bin Song, Tamer Kahveci, Sanjay Ranka, Shalesh Kaushal, and Syed M. Noorwez, Integrating structural properties of proteins and biological networks improves compound selection, submitted to Pacific Symposium on Biocomputing (PSB) 2010
[7] Bin Song, I. Esra Buyuktahtakin, Sanjay Ranka, Tamer Kahveci, A linear programming framework to identify enzyme knockout strategies for multiple enzymes catalyze the same reaction, ready to submit.
[8] Bin Song, Sanjay Ranka, Tamer Kahveci, Identify enzyme knockout strategies by transient state analysis, best student paper, ACM International conference on Bioinformatics and computational biology (ACM-BCB), 2010 . (web)