Reverse engineering gene and protein regulatory networks using graphical models: A comparative evaluation study
One of the major goals in systems biology is to infer the architecture of biochemical pathways and regulatory networks from postgenomic data, such as microarray gene expression and cytometric protein expression data. Various reverse engineering Machine Learning methods have been proposed in the literature, and it is important to understand their relative merits and shortcomings. In the talk the learning performances of three different graphical models machine learning methods, namely Relevance networks, Gaussian Graphical Models, and Bayesian networks, are cross-compared on real cytometric protein data and simulated data from the RAF signalling pathway. Relevance networks are based on pairwise association scores and straightforward to implement. But the inference is not done in the context of the whole system and there is no possibility to distinguished between direct and indirect associations. Both shortcomings are addressed by Gaussian graphical models, where the partial correlation between two variables, conditional on all the other domain variables, is employed as association score. Bayesian networks are more flexible probabilistic graphical models for conditional dependence and independence relations. Bayesian networks are based on directed acyclic graphs and can be exploited to analyse interventional data for identifying putative causal interactions. The empirical results were obtained by applying the shrinkage estimator of Schaefer and Strimmer (2005) to compute the inverse covariance matrix for Gaussian Graphical Models, and Bayesian network inference was done by sampling BNs from the posterior distribution with order Markov chain Monte Carlo (MCMC), as proposed by Friedman and Koller (2003). The experimental results were obtained by analysing data from the RAF protein signalling network reported in Sachs et al. (2005); which describes the interaction of eleven phosphorylated proteins and phospholipids in human immune system cells. Thereby it was distinguished between real cytometric protein activity measurements reported in Sachs et al. (2005) and synthetically generated data as well as between pure observational and interventional data. Observational data are obtained by passively monitoring the system without any interference while interventional data are obtained by actively manipulating variables, e.g. using gene knock-out experiments. Detailed results of this empirical study have been published in Werhli et al. (2006) and Grzegorczyk (2007). The three main findings can be summarized as follows. First, exclusively on Gaussian observational data, Bayesian networks and Gaussian graphical models were found to outperform Relevance networks. Second, for observational data no significant difference between Bayesian networks and Gaussian Graphical models was observed. Third, only for interventional data Bayesian networks clearly performed superior to the other two approaches.
Author: Marco Grzegorczyk, Biomathematics and Statistics Scotland