Skip to main content
European Commission logo print header

Reconstructing regulatory networks from high-throughput post -genomic data

Final Activity Report Summary - REGULOMICS (Reconstructing regulatory networks from high-throughput post -genomic data.)

Better understanding of the processes involved in the physiology of bacteria and how plants modulate their response to environmental challenges such as drought and pathogen attack can potentially have tremendous impact on therapeutic approaches to infectious diseases, metabolic engineering applications in biotechnology and the maintenance and improvement of crop productivity. In this project, Professor David Wild of Warwick Systems Biology Centre proposed to build statistical models of time series data, with a view to leveraging sophisticated Bayesian methods to 'reverse-engineer' an organism's complex genetic regulatory networks from the raw measurements of gene expression and metabolite concentration.

Prof. Wild has applied these techniques to a number of experimental systems: the responses of the bacterium E. coli to temperature shift and acid stress, which enable pathogenic E. coli to adapt to their hosts, and the interaction between the economically important plant pathogen, Botrytis cinerea, and the model plant Arabidopsis thaliana. He has collaborated with experimentalists Drs Francesco Falciani and Mark Viant at the University of Birmingham, and Dr Katherine Denby at the University of Warwick who provided high-throughput data. Predictions made by Wild's models could then be tested and explored back in the laboratory.

Recent advances in functional genomics technologies have given biologists unprecedented access to measurements of the inner workings of complex biological organisms. Using microarray expression profiling, it is now possible to measure the expression levels of tens of thousands of genes in just a single biological experiment, conducted over several days in the form of a time series. Contrast this to the situation only ten years ago when it was rather unusual for a biologist to measure the expression of more than just one or two carefully chosen genes. As well as high-throughput gene expression methods, the new technology of 'metabolomics' has opened the door to measuring even more information in the form of the concentration of hundreds of metabolites that are also crucial players in the complex cellular processes under study.??This overwhelming amount of data challenges traditional methods of analysis, especially when one considers the element of time, because now one must consider how certain genes regulate the expression of other genes from one time point in the experiment to the next.

A key ingredient in Wild's models is the inclusion of 'hidden factors' that help to explain the correlation structure of the observed measurements. These factors may correspond to unmeasured quantities that were not captured during the experiment and often reduce the number of direct gene-to-gene dependencies, leaving the resulting networks much more interpretable for the biologist. A natural question arises: how many hidden factors should be used to account for the dependencies in the observed data? This is answered by employing Bayesian model selection, a well-founded principle used in machine learning and statistics to choose between models of differing complexities. The models also use a technique called Automatic Relevance Determination to further simplify the models so that only those genes and metabolites that are participating players in the process are retained in the final model.

Another advantage of the Bayesian framework is that existing information about known network connections and interactions, derived from the literature or commercial databases, can be included in the model. The output of the modelling procedure is a probabilistic reckoning of which genetic regulatory networks are plausible or not. These probabilities can be used to design future biological experiments targeted at specific genes, with a view to corroborating the model's in silico predictions or to simply probe a relatively uncharted network.

This project has significantly improved the incorporation of such prior biological knowledge into the network reconstruction algorithms, and demonstrated, through experimental validation in the biological systems studied, that the approach has the potential for revealing novel important regulatory mechanisms and improving the efficiency with which genes exhibiting a defence phenotype can be discovered.