Research Areas

The goal of our research is to develop computational methods for large-scale biological data analysis and apply these methods to study biological and biomedical problems. We approach this broad goal from several angles and perspectives which overlap and compliment each other.

Protein Interaction Prediction

Protein-protein interactions play critical roles in the control of most cellular processes. The high-throughput techniques such as yeast two-hybrid screening methods used for systematically identifying protein interactions suffer from high false positive rates and high false negative rates due to the limitation of these techniques. To improve the reliability of protein interactions inference, we have developed a maximum likelihood-based approach to integrating large-scale protein interaction data from three organisms, S. cerevisiae, C. elegans, and D. melanogaster to predict protein-protein interactions in S. cerevisiae. This approach can be easily extended and applied to other organisms.

Signal Transduction Pathway Analysis

Signal transduction is the primary means by which eukaryotic cells respond to external signals from their environment and coordinate complex cellular changes. We have developed an approach that integrates protein-protein interaction data and microarray gene expression data for predicting the order of signaling pathway components, assuming all the components on the pathways are known. Our current research on this topic concentrates on the incorporation of other types of information such as protein phosphorylation data, and the development of more elaborate statistical approaches to make further prediction and modeling of the multidimensional signal transduction networks.

Microarray Data Analysis

DNA microarrays measure the expression levels of thousands of genes in a single organism simultaneously. As valuable as it is, the data generated from microarray technology brings special challenges for analysis. High level of noise and large heterogenies between protocols require robust statistical methods for data analysis and intepretation.

Genomic Data Integration

The advances in genomics and proteomics technologies have opened the door for rapid biological data acquisition and have drawn our attention towards an integrated understanding of biological interactions. Given the complex nature of the biological systems and the noisy nature of the large-scale biological data, we are interested in developing comprehensive computational models to integrate information from diverse sources in order to reconstruct different types of cellular networks, and use networks to mine functional genomics data and understand protein functions.

Bayesian Inference

From a statistical point of view, we are interested in developing and applying Bayesian methods in the biological systgems. Bayesian inference has been widely used in the analysis of high throughput bioinformatics data because biological evidence can be flexibly incorporated into Bayesian models and it naturally lends itself to efficient computational methods. We are working on the development of Bayesian approaches to identifying protein complexes using high-throughput mass spectrometry data and inferring functional modules from protein complexes.