Reconstructing Transcriptional Networks using Bayesian State Space Model

Posted in Science on September 07, 2008

Reconstructing Transcriptional Networks using Bayesian State Space Model

A major challenge in systems biology is the ability to model complex regulatory interactions. In previous work, we have used Linear-Gaussian state-space models (SSMs), also known as Linear Dynamical Systems (LDS) or Kalman filter models to 'reverse-engineer' regulatory networks from high-throughput data sources, such as microarray gene expression profiling. SSM models are a subclass of dynamic Bayesian networks used for modeling time series data and have been used extensively in many areas of control and signal processing. The parameters of an SSM can be learned using maximum likelihood (ML) methods. However, in general the ML approach is prone to overfitting, especially when fitting models with many variables with relatively small amounts of data. We have instead turned to a fully Bayesian analysis, which avoids overfitting and provides error bars on all model parameters ? in this paradigm the objective function is simply the probability of the data, that which results from integrating out the parameters of the model with respect to their prior distribution. Optimizing a model with respect to such an objective function avoids overfitting in the conventional sense. In practice, a Bayesian learning scheme infers distributions over all the parameters and makes modeling predictions by taking into account all possible parameter settings. In doing so we penalize models with too many parameters, embodying an automatic Occam's Razor effect.

We describe results from simulation studies based on synthetic mRNA time series data. Receiver Operating Characteristic (ROC) analysis demonstrates an overall accuracy in transcriptional network reconstruction from the mRNA time series measurements alone of approximately 68% Area Under the Curve (AUC) for 12 time points and better still for data sampled at a higher rate. Incorporation of prior information about known regulatory connections improves this accuracy in a fashion which appears be be linear with the number of known connections included. The implications of these simulation studies for experimental design will be discussed.

Joint work with Matthew J. Beal and Juan Li.

Author: David Wild, Keck Graduate Institute

Watch Video

Tags: Science, Lectures, Computer Science, Machine Learning, VideoLectures.Net, Bayesian Learning