Skip to main content

A systems approach linking genotype and environment to phenotype: oxidative stress response mechanisms in fission yeast

Final Report Summary - PHENOXIGEN (A systems approach linking genotype and environment to phenotype: oxidative stress response mechanisms in fission yeast)

Executive summary:

A cell's survival depends on its ability to mount a successful stress response when challenged by exposure to damaging agents. Oxidative stress caused by an excess of reactive oxygen species, is known to damage cellular components. Coordination of the complex and rapid stress response is central to the cell's viability after acute exposure. In humans, oxidative stress is involved in aging, cancer, atherosclerosis, Alzheimer's and Parkinson's disease among others.

PHENOXIGEN investigates the mechanisms that regulate the oxidative stress response by associating genetic factors to phenotype when the cell is challenged by oxidative stress. The project exploits different strains of fission yeast using a range of genome-wide analyses to interrogate the stress response and its regulation at different levels and under diverse genetic backgrounds. A large and coordinated body of heterogeneous quantitative data is being generated and integrated with published information. Using these data, we not only describe the cellular stress response at a much more comprehensive and detailed level than before, but also we address fundamental biological questions such as how natural genetic variability affects a cell's ability to cope with stress, contributing to a systems-level understanding of the oxidative stress response.

An increasingly effective way of studying the interactions between genome and environment is through genome-wide association studies (GWAS), which are used for investigating genetic association with complex traits. PHENOXIGEN is taking such an approach by correlating molecular and cellular traits with genetic variation in order to identify genomic regions impacting on a cell's ability to cope with oxidative stress. At the heart of PHENOXIGEN is a library of genetically diverse fission yeast strains. The ability of these strains to respond to oxidative stress has been investigated using continuous growth assays and measuring a wide range of molecular properties, such as ribonucleic acid (RNA) and protein concentrations. These data have been the basis of a large integrative quantitative trait locus (QTL) analysis, allowing the identification of genetic factors explaining phenotypic variability between the individual strains. The consortium has established numerous innovative technologies enabling these measurements at an unprecedented scale beyond initially planned goals. The group of J. Bähler from University College London (UCL) has established RNA sequencing using deep sequencing technology, making it possible to simultaneously genotype a large number of segregant strains and measure RNA concentration at much higher precision as compared to microarrays. Another example is the development and advancement of targeted proteomics in the laboratory of R. Aebersold from the Swiss Federal Institute of Technology (ETH), which has enabled the measurement of much more proteins as initially planned. The group of C. Workman from the Technical University of Denmark (DTU) has tested and established a robot-assisted automatic phenotyping method allowing time-resolved precise measurements of growth phenotypes. Here, PHENOXIGEN also goes beyond the initial plans, which did not consider time-resolved phenotyping. Finally, the group of A. Beyer from the Technical University of Dresden (TUD) has developed new computational methods for the analysis of RNA sequencing data, for the prediction of protein interactions and for the identification of genetic loci that affect the respective phenotypes (i.e. QTL mapping). These computational methods have been applied to relatively small budding yeast (s. cerevisiae) datasets revealing novel insights into the impact of natural genetic variation on post-transcriptional regulation.

PHENOXIGEN is the most comprehensive project of its type studying how natural genetic variation impacts on molecular and cellular traits. The project has created the foundation for similar work in higher organisms, especially humans. The experimental and computational technologies that have been developed will be applied to human GWAS and new insights have been obtained into how an individual's ability to cope with cellular stress might be affected by its genotype.

Project context and objectives:

How an organism's genotype defines the way in which it relates to its environment and, in turn, determines its cellular phenotype is central to multiple biological and clinical questions. For example, complex diseases are believed to be triggered by several mutations in combination with environmental factors such as life-style or chemical toxins (where no individual mutation in itself is harmful). Another example is tissue differentiation, which is largely driven at the cellular level by external signalling molecules that interact with the cell's genome to change the phenotype of the cell. A third important example is cancer, where the genome of tumour cells is heavily disrupted causing dramatic changes in how these cells respond to environmental stimuli.

A central lesson learned from these examples is that the interactions between genome and environment are mediated by complex networks of pathways and molecular interactions. It is established that signals are processed not only via linear kinase cascades. Signals are processed through networks of cross-talking pathways involving receptors, G-proteins, kinases, transcription factors, messenger RNA (mRNA) binding proteins, small interfering RNA as well as many other biomolecules. Signals not only change transcription of target genes, but also mRNA half-lives, translation, protein half-lives, protein localisation, protein activation and protein turnover. Although it is impossible to study all of these aspects in one research project, these deliberations underline the importance for taking a systems perspective when studying the interaction between genome and environment.

An increasingly effective way of studying the interactions between genome and environment is through GWAS, which are used for investigating genetic association with complex diseases as well as other phenotypes. Although such studies may point to important genes, they are generally unable to mechanistically explain the observed association. This project suggests a new way of looking at the functional dependence between candidate genes identified in such association studies and the relevant phenotype.

Cellular stress response, i.e. response to chemical toxins, irradiation, osmotic stress etc., represents an important interaction between the cell and its environment that is critical for its behaviour, survival and avoiding transformation into a tumour cell. Stress response pathways may also serve as models for signalling pathways in general, e.g. to understand how natural genetic variability affects intra-cellular signal processing. Several stress response pathways, including deoxyribonucleic acid (DNA) damage response and oxidative stress, have been studied in great detail making them ideal candidates for developing new experimental and computational methods, which can be verified based on the framework of available knowledge. Systems-level approaches have been used to study stress response at the transcriptional level in budding yeast (e.g. Workman et al. 2006). Some studies have also addressed stress adaptation at the posttranscriptional (i.e. protein) level. Most existing studies just measured the concentration changes of mRNA or proteins in response to some stressor.

Although the individual pathways, chains of protein-protein and protein-DNA interactions that regulate these concentration changes have been studied in detail, a systematic, genome-wide analysis at multiple gene expression levels is still lacking. Given the established complexity of regulatory networks, it is likely that many important interactions have been missed in previous small-scale studies. Particularly our understanding of cross-talk between different stress response pathways is insufficient. If methods are developed that are capable to comprehensively elucidate regulatory pathways, these methods could also be applied to less well studied response pathways in higher organisms.

Thus, the main objectives of this project were to:

1. create a library of genetically diverse fission yeast strains suitable for QTL studies and to genetically characterise it at high resolution
2. study the ability of these strains to respond to oxidative stress
3. device new experimental and computational approaches facilitating a network-based understanding of changes at the molecular level
4. thereby improve our understanding on molecular mechanisms about how natural genetic variation impacts on a cell's ability to respond to stress.

Oxidative stress response

Reactive oxygen species (ROS) are generated as metabolic by-products of aerobically growing cells and after exposure to environmental agents such as ultraviolet (UV) and ionising radiation. An excess of ROS leads to oxidative stress by directly or indirectly damaging DNA, proteins and lipids. ROS are implicated in aging and apoptosis as well as in numerous complex diseases (Finkel & Holbrook 2000). On the other hand, evidence is accumulating that ROS also provide vital signalling functions for diverse cellular processes (Rhee, 2006). Cells therefore need to precisely tune ROS homeostasis and oxidative stress defence mechanisms to maintain healthy ROS levels and accordingly have evolved sophisticated ways to sense and respond to ROS (Temple et al. 2005). In the laboratory, various oxidants such as hydrogen peroxide (H2O2) are used to trigger and analyse responses to oxidative stress.

Fission yeast (schizosaccharomyces pombe) is a popular model organism to study oxidative stress response pathways, most of which show remarkable conservation in multicellular eukaryotes (Ikner & Shiozaki 2005). At least three signalling pathways are involved in directing the transcriptional response to oxidative stress in fission yeast.

1. a mitogen-activated protein kinase (MAPK) cascade, similar to the mammalian JNK and p38 pathways (Torres, 2003), activates the Spc1/Sty1 MAPK (Buck et al. 2001). This pathway is activated in response to multiple environmental stresses and mutants defective in the pathway are hyper-sensitive to ROS and several other stresses (Quinn et al. 2002). Hundreds of genes are known to respond to this pathway (Wilhelm & Bähler 2006).
2. Pap1 is an AP-1-like transcription factor similar to mammalian Jun; it is required for survival during oxidative stress by activating genes functioning in oxidant protection after stress-induced nuclear accumulation (Toone et al. 1998). Pap1 and Sty1-Atf1 seem to have both overlapping and specialised roles in oxidative stress, with Pap1 dominating the response to low ROS levels and Sty1-Atf1 dominating the response to high ROS levels (Madrid et al. 2004).
3. Finally, a multistep phosphorelay system seems to be specialised for oxidative stress signalling in S. pombe (Ikner and Shiozaki 2005). The two-component response regulator Prr1 functions in ROS defence, probably as a direct transcriptional regulator for some oxidative stress response genes independently of the Sty1 and Pap1 pathways (Buck et al. 2001).

Despite being a great model for eukaryotic cells, there so far existed no strain library suitable for QTL studies. Thus, an objective of this project was to establish and genetically characterise such a strain library.

QTL studies

A QTL study consists of genotyping and phenotyping a panel of genetically diverse individuals (or strains). Subsequently, statistical methods are employed (mostly correlation analysis) to determine the significance of correlation between the genotypic pattern at each locus and the phenotype at hand. A strong correlation indicates that the locus may contain a gene responsible for controlling the given phenotype. In higher model organisms recombinant inbred strains are used, whereas inbreeding is not necessary in the case of haploid yeasts.

Classical QTL studies used physiological phenotypes as quantitative traits. Typical examples of such traits are body weight, heart rate, growth rate, or disease susceptibility. Since the expression of individual genes may also be affected by genetic polymorphisms, one can use gene expression as a trait for QTL studies, thereby identifying loci responsible for controlling the expression of some gene. Such expression QTL (eQTL) studies mostly use microarrays to measure mRNA concentrations genome-wide. A number of unsolved problems associated with QTL studies exist:

1. Multiple hypotheses testing: The large number of genetic markers necessitates correction for multiple hypotheses testing, but such correction may push true-positive loci below the threshold of significance.
2. Blind spots: Linkage can only be determined for loci that are polymorphic across the population. Thus, regulators within genome regions that are non-polymorphic in the parental strains cannot be studied.
3. Fine Mapping: Due to the spacing of genetic markers and/or linkage disequilibrium, several genes can reside at a significant locus. Typically, no more than one of these genes is responsible for the observed phenotype. Identification of the true causative gene requires additional data, since all genes at a locus are indistinguishable based on the eQTL data alone.
4. Lack of functional explanation: A genetic link does not explain the molecular/biochemical cause for the observed association.
5. Complex traits and pleiotropy: A complex trait corresponds to the case that a target gene (phenotype) may have many regulators and, consequently, may associate with many eQTLs. Pleiotropy denotes the case that one locus could be associated with many phenotypes (i.e. several target genes or phenotypes). Present methods can cope with these situations only if the associations are strong. However, most often the data do not provide enough statistical power to detect such correlations and QTL data alone cannot explain complex traits or pleiotropy.

Several bioinformatic approaches have been proposed recently to address the above problems. Some of these address the issue of fine mapping by predicting which genes within a given locus are the true regulators of expression of the target phenotype. Lee et al. (2006) used a Bayesian scoring model combined with a module finding approach to infer modules regulating the expression of a downstream target. Schadt et al. (2005) identified possible relationships between traits and used a likelihood model based on conditional correlations to spot the relationship that is best supported by the model.

The problem of blind spots

Almost all eQTL studies started with two (recombinant inbred) parental strains. However, QTL can only be detected for loci containing polymorphisms between the two parental strains. It is known that the approach described above creates many 'blind spots', i.e. regions on the genome that are devoid of any polymorphisms will never create significant QTL. To overcome these problems in mice it has recently been suggested to use panels of inbred strains (MDP) for QTL and eQTL studies (McClurg et al. 2007). In this approach one correlates the genotypes and phenotypes of a range of largely independent inbred strains as opposed to assessing the F2 generation of just two inbred strains. Although this approach deals with much larger genetic and phenotypic variability, it has much lower statistical power, mainly because of the much smaller number of genetically diverse individuals. In this project we employed an approach to increase the genetic and phenotypic diversity of the assessed strains without significantly reducing the number of genetically different strains. By creating segregants from three parental strains and by devising respective analysis methods we could substantially increase the genetic resolution and biological relevance of the study.

Proteomic QTL

Concentrations of mRNA measured with microarrays were the first molecular phenotypes that were used for genomic QTL studies. A natural extension of this idea is to also use other molecular phenotypes such as protein concentrations or protein phosphorylation levels as traits for QTL studies (Foss et al. 2007). Such 'proteomic QTL study' perfectly complements the eQTL approach, because many important changes in response to stress and other signals occur at the posttranscriptional level (Beyer et al. 2004, Brockmann et al. 2007). Coupling modern proteomic methods to the QTL approach promises to identify potentially many new important regulators controlling posttranscriptional modifications. However, proteomic QTL studies had limited success for a number of reasons. The major obstacles are certainly related to experimental limitations. In order to gain sufficient statistical power one needs to measure protein concentrations or phosphorylation levels across a panel of 60 to 100 strains with high reproducibility and sensitivity.

The most up-to-date proteomic methods are based on the analysis by liquid chromatography coupled-mass spectrometry (LC/MS) of complex peptide mixtures, generated by proteolysis of protein samples (Aebersold & Mann 2003). However, these methods are non-targeted, i.e. in each measurement they quasi-randomly sample a fraction of the proteome. Each repeat analysis required for comparing a proteome at different states, will sample only a subset of the yeast proteins and not necessarily the same subset in each repeat, thus precluding the generation of complete datasets. An additional limitation to a comprehensive proteomic analysis is the difficulty in detecting low abundant proteins. Hence, it is necessary to employ new proteomics technologies that are characterised by high reproducibility and sensitivity.

Protein interaction networks

Protein interactions in budding yeast have been studied in great detail. This situation is different for fission yeast. Any large-scale network integrating the available evidences for protein-protein interactions is lacking. Currently, the only exception is the STRING database, which predicts interactions for virtually all fully sequenced species largely based on transfer from orthologous genes in other species. However, these interactions are not necessarily physical protein-protein interactions, but they are more likely to represent functional association of the respective genes. About 70 % of all fission yeast genes have homologs in budding yeast (Inparanoid, version two). Many more proteins have conserved interaction domains. A recent comparative experimental study measured protein interactions of orthologous proteins from budding and fission yeasts using a highly redundant tandem affinity tagging approach (Shevchenko et al.2008). This study confirmed that most interactions could be predicted based on sequence homology or at least based on common interaction domains in the respective proteins from the two species. It is therefore possible to recover most of the s. pombe interactions by predicting interactions based on sequence homology of the putative partners. Thus, another goal of this project was to create a fission yeast protein interactome by transferring interaction information from other species in order to facilitate the integrated, network-based analysis of molecular and physiological (i.e. growth) QTL.

References

1. Aebersold, R. & Mann M Nature 422: 198-207 (2003)
2. Beyer, A., Hollunder, J., Nasheuer, H.P. Wilhelm, T. Mol Cell Proteomics 3: 1083-92 (2004)
3. Beyer, A., Workman, C., Radke, D., Moeller, U., Wilhelm, T., Ideker, T. PLoS Comput Biol 2:e70 (2006)
4. Beyer, A., Bandyopadhyay, S., Ideker, T. Nat Rev Genet 8: 699-710 (2007)
5. Brem, R.B. Yvert, G., Clinton, R., Kruglyak, L. Science 296: 752-755 (2002)
6. Brem, R.B. Storey, J.D. Whittle, J., Kruglyak, L. Nature 436: 701-703 (2005)
7. Brockmann, R., Beyer, A., Heinisch, J.J. Wilhelm, T. PloS Comput Biol 3:e57 (2007)
8. Buck, V., Quinn, J., Soto Pino, T., Martin, H., Saldanha, J., Makino, K., Morgan, B.A. Millar, J.B. Mol Biol Cell 12: 407-419 (2001)
9. Corthals, G.L. Wasinger, V.C. Hochstrasser, D.F. & Sanchez, J.C. Electrophoresis 21: 1104-1115 (2000)
10. Desiere, F., Deutsch, E.W. King, W.L. Nesvizhskii, A.I. Mallick, P., Eng, J., Chen, S., Eddes, J., Loevenich, S.N. Aebersold, R. Nucleic Acids Res 34: D655-8 (2006)
11. Domon, B. & Broder, S. J Proteome Res 3: 253-260 (2004)
12. Domon, B., Aebersold, R. Mol Cell Proteomics 5: 1921-1926 (2006)
13. Finkel, T., Holbrook, N.J. Nature 408: 239-247 (2000)
14. Foss EJ, Radulovic D, Shaffer SA, Ruderfer DM, Bedalov A, Goodlett DR, Kruglyak L. Nat Genet. 39(11):1369-75 (2007)
15. Gaits, F., Degols, G., Shiozaki, K., Russell, P. Genes Dev 12: 1464-1473 (1998)
16. Hu, Z., Killion, P.J. Iyer, V.R. Nat Genet 39: 683-687 (2007)
17. Ikner, A., Shiozaki, K.. Mutat Res 569: 13-27 (2005)
18. King, N.L. Deutsch, E.W. Ranish, J.A. Nesvizhskii, A.I. Eddes, J.S. et al. Genome Biol 7: R106 (2006)
19. Ideker, T., Ozier, O., Schwikowski, B., Siegel, A.F. Bioinformatics 1:S233-240 (2002)
20. Lee, I., Date, S.V. Adai, A.T. Marcotte, E.M. Science 306: 1555-1558 (2004)
21. Lee SI, Peer D, Dudley AM, Church GM, Koller D: Proc. Natl. Acad. Sci. U.S.A. 2006, 103:14062-14067.
22. Madrid, M., Soto, T., Franco, A., Paredes, V., Vicente, J., Hidalgo, E., Gacto, M., Cansado, J. J Biol Chem 279: 41594-41602 (2004)
23. McClurg, P., Janes, J., Wu, C., Delano, D.L. Walker, J.R. Batalov, S., Takahashi, J.S. Shimomura, K., Kohsaka, A., Bass, J., Wiltshire, T., Su, A.I Genetics 176: 675-683 (2007)
24. Myers, C.L. Barrett, D.R. Hibbs, M.A. Huttenhower, C., Troyanskaya, O.G. BMC Genomics 7:187 (2006)
25. Reguly, T., Breitkreutz, A., Boucher, L., Breitkreutz, B.J. Hon, G.C. Myers, et al. J Biol 5:11 (2006)
26. Rhee, S.G. (2006). Science 312: 1882-1883
27. Sanchez-Piris, M., Posas. F., Alemany, V., Winge, I., Hidalgo, E., Bachs, O., Aligue, R. J Biol Chem 277: 17722-17727 (2002)
28. Schadt EE, Lamb J, Yang X, Zhu J, Edwards S, Guhathakurta D, Sieberts SK, Monks S, Reitman M, Zhang C et al.: Nature genetics 2005, 37(7):710-717.
29. Shevchenko A, Roguev A, Schaft D, Buchanan L, Habermann B, Sakalar C, Thomas H, Krogan NJ, Shevchenko A, Stewart AF. Genome Biol. 9(11):R167 (2008)
30. Smith, D.A. Toone, W.M. Chen, D., Bähler, J., Jones, N., Morgan, B.A. Quinn, J. J Biol Chem 277: 33411-33421 (2002)
31. Stahl-Zeng, J., Lange, V., Ossola, R., Aebersold, R., Domon, B Mol Cell Proteomics (2007)
32. Suthram, S., Shlomi, T., Ruppin, E., Sharan, R., Ideker, T. BMC Bioinformatics 7:360 (2006)
33. Temple, MD., Perrone, G.G. Dawes, I.W. Trends Cell Biol 15: 319-326 (2005)
34. Toone, W.M. Kuge, S., Samuels, M., Morgan, B.A. Toda, T., Jones, N. Genes Dev 12: 1453-1463 (1998)
35. Toone, W.M. Morgan, B.A. Jones, N. Oncogene 20: 2336-2346 (2001)
36. Torres, M. Front Biosci 8: d369-391 (2003)
37. Tu, Z., Wang, L., Arbeitman, M.N. Chen, T., Sun, F. Bioinformatics 22: e489-e496 (2006)
38. von Meering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G. Fields, S., Bork, P. Nature 417: 399-403 (2002)
39. Watson, A., Mata, J., Bähler, J., Carr, A., Humphrey, T. Mol Biol Cell 15: 851-860 (2004)
40. Wilhelm, B.T. Bähler, J. The genomics of stress response in fission yeast. In The Mycota, Vol XIII, Fungal Genomics, Brown AJP (ed) pp 97-111. Berlin, Springer (2006)
41. Wood, V. et al. Nature 415, 871-880 (2002).
42. Workman, C.T. Mak, H.C. McCuine, S., Tagne, J.B. Agarwal, M., Ozier, O., Begley, T.J. Samson, L.D. & Ideker, T. Science, 312: 1054-1059 (2006)
43. Yeang, C.H. Mak, H.C. McCuine, S., Workman, C., Jaakkola, T., Ideker, T. Genome Biol 6: R62 (2005).

Project results:

An increasingly effective way of studying the interactions between genome and environment is through genome-wide association studies, which are used for investigating genetic association with complex traits. At the heart of PHENOXIGEN is a library of genetically diverse fission yeast strains. The ability of these strains to respond to oxidative stress has been investigated using continuous growth assays and measuring a wide range of molecular properties, such as RNA and protein concentrations. These data have been the basis of a large integrative QTL analysis, allowing the identification of genetic factors explaining phenotypic variability between the individual strains. The consortium has established numerous innovative technologies enabling these measurements at an unprecedented scale beyond initially planned goals. The group of J. Bähler (UCL) has established RNA sequencing using deep sequencing technology, making it possible to simultaneously genotype a large number of segregant strains and measure RNA concentration at much higher precision as compared to microarrays. Another example is the development and advancement of targeted proteomics in the laboratory of R. Aebersold (ETH), which has enabled the measurement of much more proteins as initially planned. The group of C. Workman (DTU) has tested and established a robot-assisted automatic phenotyping method allowing time-resolved precise measurements of growth phenotypes. Here, PHENOXIGEN also goes beyond the initial plans, which did not consider time-resolved phenotyping. Finally, the group of A. Beyer (TUD) has developed new computational methods for the analysis of RNA-seq data, for the prediction of protein interactions and for the identification of genetic loci that affect the respective phenotypes (i.e. QTL mapping). These computational methods have been applied to relatively small budding yeast (S. cerevisiae) datasets revealing novel insights into the impact of natural genetic variation on post-transcriptional regulation.

PHENOXIGEN has achieved the following results:

1. A fission yeast strain library suitable for QTL mapping has been created and fully genotyped. This is the first strain library of this type for fission yeast. This library has been created by crossing three independent wild strains ('parental strains'), which have been fully sequenced. The PHENOXIGEN fission yeast strain library will be an important resource for the research community.
2. Within the project 173 strains have been subjected to continuous-time growth assays under normal and oxidative stress conditions. We have developed new algorithms for the analysis of the resulting growth curves for extracting a range of quantitative characteristics from the data (such as slope of the fastest growth or maximum culture density).
3. Moreover, 130 strains have been analysed by RNA-sequencing under normal and oxidative stress conditions. It was initially suggested to use DNA microarrays for this task. We have however decided to use RNA sequencing, which provides much more detailed quantitative information about transcript levels of coding and non-coding genes, including mitochondrial genes. Especially non-coding genes are not included on standard DNA microarrays and mitochondrial genes could not have been genotyped if we had used microarrays.
4. Proteomics measurements for 59 strains, again under normal and stress conditions, have been performed. The samples are identical with those used for RNA-sequencing and on average 2 500 proteins could be identified in these experiments. Importantly, 800 proteins could be identified in more than 80 % of the samples, which is crucial for QTL mapping.
5. The above traits have been mapped to the fission yeast genome (QTL mapping) for identifying genomic regions affecting these traits. Thereby it becomes possible to identify naturally occurring genetic variations impacting on those traits with a specific focus on oxidative stress response.
6. We have conducted a dense time course experiment, measuring protein and RNA expression changes in response to oxidative stress in one parental strain (11 time points, in triplicates). The analysis of these experiments revealed intricate details about the dynamics of response to this stress at the transcriptional and post-transcriptional level. The insights are crucial for the interpretation of the QTL experiments above. These time course experiments were not initially planned.
7. The initial problem with mating the different wild isolates has triggered a 'spin-off project' in which we are investigating speciation through reproductive isolation in fission yeast. The Bähler-lab has conducted many additional mating experiments using a wide range of strains through which we have gained a lot of new insights into what the underlying genetic causes of the mating inefficiency have been.
8. Two networks of fission yeast protein-protein interactions have been created in close collaboration between the Beyer and the Bähler laboratories. These efforts have led to two joint publications in scientific journals. The Beyer-group has devised new machine-learning techniques for the prediction of physical protein associations based on known functional links between the encoding genes. These methods exploit the structure of the underlying networks.
9. A new QTL-mapping method has been developed, which accounts for potential epistatic interactions between several genetic loci impacting on the same trait (such as growth under oxidative stress or expression of some gene). Detailed tests using measured data (i.e. not simulated data) have shown that this new method outperforms existing approaches. A new method for extracting the information which loci are epistatically interacting (if any) from the above mapping has also been developed and tested.
10. We have developed new bioinformatic methods for extracting genotype information from RNA-seq data, which allowed us to perform two tasks at the same time using just one assay: genotyping and transcriptome-phenotyping.
11. We have developed a method for mapping RNA-seq reads to the genome of each individual yeast strain, which improved the quantification of transcript levels. If reference (i.e. not strain-specific) genomes were used, transcript levels were often ill-quantified, because many reads could not be correctly mapped. We have conducted detailed analysis of this problem, which is of immediate relevance for RNA-sequencing in general including especially human studies, since every human has a genome deviating from the reference genome.
12. hrough the much more precise RNA measurements we could gain new insights into the fraction of eQTL affecting directly neighbouring genes (so called local eQTL) versus others linking to genes elsewhere in the genome (so called trans-eQTL). Our analysis suggests that previous estimates of local eQTL have been erroneously inflated due to incorrect measurements when using microarrays.
13. Analysis of the RNA-based QTL has revealed a region on fission yeast's chromosome III affecting an extraordinarily large number of genes (713 genes). Such region is called an eQTL hotspot. We have never seen an eQTL hotspot affecting so many genes anywhere in the literature before. Through detailed analysis of this region we could rule out that the observation is due to some artefact, confirm a strong impact on the yeasts' growth (thus, the causal mutation has physiological relevance) and drill down to the most likely molecular mechanism. Validation experiments for confirming our hypothesis are currently ongoing.
14. We produced four collaborative publications in peer-reviewed journals. Several additional joint manuscripts were in preparation by the time of the project completion.

Potential impact:

Technological and biological progress

The project has generated a number of experimental and computational tools valuable for the broader scientific community. This includes among others the QTL strain library for fission yeast, computational tools for analysing QTL from a 'three-background strain population', new proteomics methods, computational methods for the network mapping of genetic association data, for the analysis of RNA-sequencing and proteomics-based QTL and for integrating proteomics and RNA-QTL data. The complementary nature of the experimental data generated in the project has been exploited with a unique computational framework for the consistent data integration.

Because we decided to quantify RNA using latest deep sequencing technology rather than DNA microarrays, we have also created numerous new computational tools required for the analysis of this data. We have truly stepped on new ground and our experiences with RNA-seq based QTL studies will be highly relevant for future studies in higher organisms including humans.

The biological insights relate to important questions such as how natural genetic variability affects cellular stress response or what the mechanisms are determining a cellular phenotype based on its genotype and environmental factors. Importantly, the project has provided mechanistic and systems-level insights into how exactly the various inputs are being processed. The answers to these questions will be important for understanding how natural genetic variability may be causative for complex diseases in general. For example, it will be possible to mechanistically explain how combinations of single-nucleotide polymorphisms (SNPs) create a genotype that is particularly vulnerable to a certain type of disease.

PHENOXIGEN has explored numerous new directions:

1. it is the first QTL study conducted in fission yeast
2. it is one of the first eQTL study measuring RNA concentrations with deep sequencing (RNA-seq)
3. it is the most comprehensive proteomics QTL project so far
4. it explored new ways for analysing QTL data in conjunction with molecular interaction networks and
5. it presents a novel way for the mechanistic analysis of how natural genetic variation affects cellular stress response.

The project brought together experts from different areas related to systems biology and thereby established new European collaborations reaching far beyond the end of the funding period.

An important aim of the project was to enable a system-wide perspective at each level of the regulatory processes under study. All work packages have been designed accordingly, so that all necessary data will be available for every aspect of the model. Existing data are used as much as possible in order to focus the experimental resources to the most pressing needs. In that respect PHENOXIGEN is a successful example of a truly integrated project.

Europeans are already the world leaders in some of the areas covered by this project. For example, Europe is leading in the field of fission yeast biology or mass spectrometry. European groups had the leading role during the fission yeast sequencing project and the partner five (J. Bähler) continues to take leading roles in the s. pombe community by curating and hosting the model organism database ('http://www.genedb.org/genedb/pombe/index.jsp'). Likewise, European researchers are at the forefront of large-scale proteomics. For example, Matthias Mann (Munich) and Ruedi Aebersold (Zurich) are respected leaders in the field. The PHENOXIGEN project will further advance the lead in these research areas. However, main innovations in the area of QTL/eQTL analysis have been developed in North America. PHENOXIGEN has lead to several advancements compared to existing approaches for QTL analysis. Hence, PHENOXIGEN has brought Europe closer to the forefront also of this field.

Finally, PHENOXIGEN has particularly strengthened the European expertise connecting systems biology with genetics and network biology.

Relevance to health and diseases

Eukaryotic stress response in general and oxidative stress response in particular, is obviously of high relevance to human health. For example the study of the DNA damage response pathway in budding yeast and fission yeast has lead to important insights into how DNA damage response is organised in higher eukaryotes. In addition, the project addresses the important question how stress responses relate to the individual genotype. How much does genetic variability affect stress response? How do different polymorphisms act together to change a cell's ability to properly cope with stress? And what are the underlying mechanisms linking genotype to phenotype? These are questions of vital importance that are addressed by the project and which will help for better understanding disease susceptibility.

Complex diseases are caused by the interaction of several mutations and environmental factors. Importantly, the individual SNPs alone usually do not cause pathological phenotypes. Hence, the combinatorial interaction between the SNPs is critical for understanding the relationship between genotype, environment and disease.

The project contributes to our understanding of such complex genetic interactions by revealing the molecular mechanisms that create this combinatorial phenotype. PHENOXIGEN uncovers regulatory networks relevant for eukaryotic stress response. The methods developed for analysing complex genetic interactions and for uncovering the causes of natural variability in stress response will also be instrumental for studying complex diseases in mammals.

Availability of resources

PHENOXIGEN has advanced our experimental and computational capabilities. It is an important goal of the project that the experiments generate all data necessary for the modelling and that the modelling uses all experimental data generated during the project. The data integration and modelling applies computational methods from all areas of bioinformatics, such as sequence analysis (e.g. ortholog detection, motif discovery), structural biology (e.g. protein binding prediction), advanced biostatistics (microarray analysis, QTL detection) and network biology (module detection, pathway simulation). Many of these methods have been further developed during the project and the models have been experimentally tested to ensure that they actually reflect biological reality. The inter-dependent theoretical and experimental approaches are the hallmark of systems biology and will be a highlight and ultimate achievement of the proposed research. All the data, resources and computational methods developed in the project have already or will be made available after publication of the results in scientific journals.

Resources, publications, etc. will also be linked on the project's homepage at 'http://www.phenoxigen.eu'.

Dissemination activities

PHENOXIGEN has taken (and is continuing to take) a range of measures for disseminating its results. First of all, the project's internet homepage has been created at the very beginning of the project (at 'http://www.phenoxigen.eu'). This website is being used for listing publications and news on the project and it will be used for linking to data generated during the project (after the respective papers have been published).

Of course, project results have been and will be published in international, peer-reviewed scientific journals. For example, four joint articles involving at least two PHENOXIGEN partners have already been published, which also demonstrates the collaborative success of the project.

Next, we have taken many measures to directly link to peers outside the project. Already at the kick-off meeting we invited international experts from Europe and north America, which stimulated the discussion, provided very important input for the project and broad the project to the attention of other key figures in the field. Furthermore, A. Beyer organised a special session on e-QTL at the ISMB 2009 in Stockholm. The yearly Intelligent Systems for Molecular Biology (ISMB) conference is the largest meeting of computational biologists in the world. Thus, this was a great opportunity to disseminate project results and improve the visibility of this Seventh Framework Programme (FP7) activity. Finally, we have organised an international scientific conference in Dresden with several hundred participants and 21 international speakers (BIOTEC forum: Systems biology and cellular regulation, 3 to 5 May 2011). This meeting (which was co-organised together with the FP7 systems biology project 'SyBoSS', project ID 242129) gave the four PHENOXIGEN principal investigators (PIs) the opportunity to present their latest research to a broader audience. The joint coordination and funding of two FP7 projects presented a very efficient use of resources and increased the visibility of both projects.

List of websites:

'http://www.phenoxigen.eu'.