European Commission logo
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS
Contenido archivado el 2024-06-18

Multi-phenotype Analysis of Rare Variants – devELopment of an analysis method and software with implementation to large-scale data to unravel pleiotropic genetic effects behind cardiometabolic traits

Final Report Summary - MARVEL (Multi-phenotype Analysis of Rare Variants – devELopment of an analysis method and software with implementation to large-scale data to unravel pleiotropic genetic effects behind cardiometabolic traits)

The overall objective of the project MARVEL was to dissect genetic architecture underlying the variability of cardiometabolic phenotypes and risk to related disease outcomes to better prevent and treat metabolic diseases. This was planned to be achieved by combining knowledge and expertise across multiple disciplines: medicine, genetics, bioinformatics and statistics and more specifically, by developing a multi-phenotype analysis method (Objective 1) and tool (Objective 2) for rare variants, and by conducting a large-scale meta-analysis of cardiometabolic traits (Objective 3).

Specifically, the first two objectives of the project MARVEL were to develop a method and a related software tool for joint analysis of correlated phenotypes, such as blood pressure and body-mass index, in the search for rare genetic variants affecting them. Traditionally, large-scale genetic association analyses have looked at one phenotype at a time, although it has been shown that by analysing correlated phenotypes jointly, the power for finding genetic variants associated with the phenotypes will be improved. Also, by using joint analysis, we may be better informed on the biology behind the associations, including pleiotropic effects, where one genetic variant affects more than one phenotype. With the example phenotypes above, this would mean that a same genetic variant would be involved in the regulation of both blood pressure and body weight, and with the application of multi-phenotype methodology, such variant would be easier to detect.

Multi-phenotype methods have previously been developed for analysis of genetic variants that are common in the population, i.e. more than 5% of the population carry a specific form of the genotype. Our approach was to develop a method for variants that are rare in the population since the methods for common variant analyses are underpowered for detecting rare variants unless we have very large sample sizes with tens or even hundreds of thousands of individuals. Therefore, for the first Objective we extended a so-called burden test for single phenotype rare variant analysis to the analysis of multiple phenotypes. Then, after the mathematical formulation of our proposed method, we used the C++ programming language to code the developed method into a user-friendly software tool that reads in standard output for genetic data. These two steps resulted in our novel method and tool for Multi-phenotype Analysis of Rare Variants, MARV.

An essential part of the Objectives 1 and 2 was to test the properties of MARV under various scenarios, including varying number of phenotypes, correlation structure between the phenotypes, sample sizes and genetic effects. This step involved simulating large-scale genetic data whose structure would reflect that of real genetic data in today’s Europeans taking into account population history. For this, we used the methodology implemented in the ForSim software. Then, we simulated phenotypic data with the statistical software R to correspond to the scenarios which would allow us evaluating the performance of our newly developed method and tool. This step was intensive and required to be expanded to additional comparisons with other concurrent emerging methods, implementing different statistical methodology. We successfully demonstrated that our software tool covers wider range of data types, formats and data sizes as well as allows for any phenotype, binary or quantitative, being included into analysis.

Two publications describing the method and the software have resulted from the work for the two first Objectives (under review). The method shows an improvement in power compared to a traditional single phenotype rare variant association test as well as over the other published competing method in almost all the tested scenarios. The related software tool is also computationally efficient even with larger number of phenotypes, as indicated by our extensive simulation studies. Therefore, we expect researchers in the field of statistical genetics to benefit from our developed method and tool, by enabling novel discoveries in complex trait genomics. To illustrate our method and tool in the software and method papers, we have performed analysis of correlated cardiometabolic traits. These analyses, although performed in just one cohort of less than 5000 individuals, have already allowed us to detect novel loci associated with cardiometabolic traits.

The third Objective of our project was to find novel causal rare variants associated with cardiometabolic traits by undertaking large-scale meta-analysis using the developed method and software. For this, we specifically focussed on polyunsaturated fatty acids (PUFAs), such as omega-3, derived by nuclear magnetic resonance imaging technique in two Northern Finnish birth cohorts, totalling to a sample size of about 8000 individuals. Most genetic studies on blood lipids have so far focussed on lipids typically used in clinical practice, such as triglycerides and high- and low-density lipoprotein cholesterols. However, little is known on the genetic architecture of PUFAs, although they have been associated with several disease outcomes of major public health importance, including Alzheimer’s disease, breast cancer, depression and type 2 diabetes. Recently, the availability of the so-called metabolomics data has enabled the genetic analysis of more refined blood lipids, such as PUFAs. These omics data, however, contain hundreds or even thousands of variables, making traditional single-phenotype approaches underpowered due to the required correction for multiple testing, i.e. when hundreds of association tests are performed, it is very likely that associations arise by chance and therefore the level of significance needs to be set low. However, our developed method for analysing multiple phenotypes jointly is ideal for such data, because no such correction for multiple testing is required. Therefore, even with a relatively small sample size of 8000 individuals we were able to detect rare variant associations for the first time in a genetic locus FADS1 in chromosome 11 for which only common variant associations have been reported before. This observation provided a proof of principle about the role of rare variants at this common variant locus, which in combination would explain a larger proportion of PUFA levels variability than the common variant on its own and thus might provide important clues for future explorations of the effects of PUFAs on individual health. We also applied other multi-phenotype methods developed in our team and by our collaborators, SCOPA and METASCOPA for common variant analysis, and discovered ten genetic loci associated with PUFAs of which two are novel (manuscript in preparation). Some of the discovered variants are within druggable genes. Hence, these discoveries may in the long run lead into new pharmaceutical discoveries aiming to help the treatment of several diseases, as listed above.

In summary, project MARVEL has enabled the development of a powerful method and software tool MARV to enable novel rare variant discoveries using big data in the field of genomics and metabolomics. The method and tool are aimed at statistical geneticists working on large-scale data. We have applied the method to cardiometabolic phenotypes, but its use is not limited to those but can be similarly applied to any high-dimensional, correlated data, such as autoimmune diseases, cancers, cognitive and psychiatric outcomes as well as to model organisms. The novel collaborations based on this project work are already being set up, and will lead to future grant applications and career development of the Fellow. The discoveries resulting from the use of the method will expand to a wide audience, including pharmaceutical industry, health care professionals and finally, high-risk individuals and patients with better prevention and treatment possibilities as our understanding of the diseases will improve due to novel discoveries, enabled by methods such as MARV.