Skip to main content

Machine learning for quantitative modelling of structured phenotypes

Final Report Summary - MLPHENOM (Machine learning for quantitative modelling of structured phenotypes)

MLPHENOM has the aim to address open methodological needs to unravel the variation of phenotypes as a function of genetic differences between individuals. While previously, most phenotypes of interest have been simple and hence could be expressed in a single quantitative value; modern high-throughput phenotyping enables the generation of increasing complex phenotype data. In this project, it is our goal to develop the necessary statistical and computational methods to fully exploit these complex phenotypes by accounting for the structure within these data. By improved modeling using statistics as our tools, we will derive approaches for improved interpretation and analysis of genotype to phenotype relationships. In particular, we will address three major components of phenotype structure: time structure from repeated measurements of the same phenotype over time, image structure where digital images capture phenotypic differences and network structure, where direct and indirect effects can be disentangled by accounting for all variables in a single model.

Over the course of this project, we have tackled these aims by proposing a new coherent statistical framework for the joint genetic analysis of high-dimensional phenotypic measurements. These multivariate models allow for exploiting rich correlations between individual phenotypic measurements, thereby flexibly addressing diverse aspects of phenotype structure. Key results of MLPHENOM include new ways for analyzing phenotype networks that span hundreds of individual measurements (Rakitsch et al. 2013). Moreover, we have developed methods that enable detecting environment-related substructure within the phenome, even if the environmental factors have not been measured themselves (Fusi et al. 2013). These models greatly increase the detection power of genotype-to-phenotype associations in different settings where structure occurs. Complementary to these methods to detect associations, we have developed network models that enable for ordering individual molecular events between genotype and phenotype. In collaboration with researchers from EMBL, we have used these approaches to derive molecular intervention point in a genome-wide screen in yeast, resulting in an improved mechanistic and causal interpretation (Gagneur et al. 2013).

The methods developed in MLPHENOM have laid the foundation for the analysis of future genetic studies, where increasingly large numbers of complex phenotypes will be gathered. We are actively pursuing applications of these methods to biomedical research directions in human genetics where we expect widespread translational use.