Network inference using informative priors
TP Speed in collaboration with S Mukherjee (University of Warwick, UK)
In recent years, several methods have been developed for improving upon the traditional Western blot method for detecting and quantifying specific proteins in a given sample of cells. These novel methods permit the simultaneous measurement of multiple protein levels, including phosphorylated proteins in cell samples and even single cells. Once we have the ability to measure several proteins at the same time in cells, it is natural to want to use that capability to explore intracellular signalling and several groups are doing just this. It is also natural to want to make use of mathematical models in doing so. The research described here concerns one mathematical approach to the analysis of such data: the use of Bayes networks.
The figure is a Bayes network based on a set of data on 11 protein phospho-forms and isoforms, measured by a recently developed assay (from the Kinexus corporation of Canada) on 18 luminal breast cancer cell lines. (Thanks to Dr JW Gray of the Lawrence Berkeley National Laboratory for permitting our use of these data.) The proteins are all part of the epidermal growth factor receptor signalling network. The aim in using Bayes networks is to explore associations between the 11 protein levels across the 18 cell lines in a coherent manner, with the major unknown in the model being the underlying network, which is meant to represent the signalling interactions. Inferences can be made concerning individual edges, as well as the whole network, and are carried out by a variant of a standard Markov chain Monte Carlo approach. Interest naturally focuses on the interactions inferred, particularly ones not previously known, which can then be tested directly.
One novelty in our work is the way in which biochemical aspects of pathways are embodied in what is known as a prior distribution for the network structure, e.g. we suppose that ligands influence cytosolic proteins via ligand-receptor interactions. As a consequence, we do not expect them to directly influence cytosolic proteins. Equally, we do not expect either receptors or cytosolic proteins to directly influence ligands. Using the prior distribution, constraints like this are imposed on the network structures that are inferred, thereby ensuring the biological reasonableness of the result. The numbers in the parentheses beside the directed edges of the network in the figure are the estimated (posterior) probabilities of those edges being present in the underlying network. In a way, this representation is a way of depicting interconnected associations.
Bayes network for a subset of the EGFR pathway.
Using combinations of genetic mutations to infer gene interaction networks.
A Oshlack, C Law, GK Smyth in collaboration with CA de Graaf, DJ Hilton, C Carmichael (Molecular Medicine Division)
A basic tool in functional genomics is to perturb the expression of a gene of interest and to observe the response in other genes, implicating the responding genes as downstream targets of the perturbed gene. When perturbations are available for a pair of genes, both singularly and in combination, it becomes possible to search for evidence of interaction between those genes. Using whole-genome expression data from single and double mutants, we are developing statistical methods for inferring whether the mutated genes share common molecular pathways dependent on the two genes simultaneously. A set of linear models, each representing a different possible epistatic relationship, is fitted to the measured expression levels of each gene. The distribution of best fitting models provides evidence for different modes of interaction. From this we can begin to infer the pathways in which the mutated genes interact and the target genes involved in those pathways. Hence we begin to build up a picture of a gene interaction network.
Estimating 13C enrichment in time course experiments
MD Robinson, M O’Hely, TP Speed in collaboration with MJ McConville (Bio21 Institute, Melbourne)
Gas chromatography coupled to mass spectrometry (GCMS) is being used for the dynamic analysis of metabolic pathways. Tracer experiments compare metabolite profiles under standard growth conditions to profiles at different times after switching to a nutrient source utilizing 13C-glucose. Several methods have been suggested to calculate 13C enrichment, taking into account the natural abundance of 13C and other higher-molecular-weight isotopes. If the composition of a diagnostic ion is known, a linear model can be used to partition the observed signal into proportions attributable to the unlabelled and labeled forms of the fragment. If the composition is unknown, a simple enrichment estimator is the percent change in intensity of the monoisotopic mass. A key feature of our analysis is making use of multiple diagnostic ions, affording robust enrichment estimates.
Metabolic flux analysis
M O’Hely, TP Speed in collaboration with MJ McConville (Bio21 Institute, Melbourne)
Cells process chemicals: for example, they process glucose to produce energy. Nature allows cells many potential pathways for this processing. Besides business-as-usual, a cell may sequester chemicals for later use, or have to switch to a less refined nutrient source. The flux in a metabolic pathway is a measure of how many molecules pass along it in a given time and can be inferred from a labelling experiment using enrichment estimates for end-point and intermediate metabolites. Our goal is to understand active pathways in pathogens which will allow drugs to be designed to treat diseases.