Periodic Reporting for period 2 - ISULO (Innovative Statistical modelling for a better Understanding of Longitudinal multivariate responses in relation to Omic datasets)
Período documentado: 2022-03-01 hasta 2023-02-28
During the last decades, advances of high-throughput technologies have led to the acquisition of various types of -omic data, including whole-genome sequencing, methylation, transcriptomic, glycomic, proteomic, and metabolomic. Such data offer the opportunity of gaining insights into complex biological mechanisms by making possible the study of relationships between features from various molecular levels and the study of their implication in biological processes. The first approaches have analyzed each dataset separately providing a snapshot of the molecular processes involved. Over the last years, integrative approaches that combine complementary information from each data type have been proposed. While they have led to significant results, they have also highlighted that the integration of several biological layers of information is still challenging. Indeed, statistical models need to efficiently deal with high-dimension while considering dependence between/within -omic data. In addition, there has been a growing interest in identifying biological mechanisms involved in the evolution of traits along time, for example by identifying active genes along plant growth, or biological pathways involved in fetus development. The associated data are characterized by measures along time on the same individuals and are called longitudinal data. The dependence among measurements on a same individual needs to be considered in the statistical models.
The joint analysis of traits evolving over time in relation to high-dimensional -omic data from various biological levels is a recent active area that gives new opportunities for an enhanced understanding of dynamic biological processes. But this also raises new statistical challenges related to, among others, high-dimensionality and dependence structure between variables. To this end, innovative statistical methods need to be developed. The ISULO project’s main objective was to propose innovative methodologies to simultaneously analyze longitudinal data and high-dimensional -omic data. The research methodology has been subdivided into two major parts, one focusing on the analysis of one longitudinal outcome in relation to one -omic dataset, and a second part focusing on the integration of multiple high-dimensional -omic datasets for explaining non-longitudinal and/or longitudinal outcomes.
In the two parts, the proposed methods aim at dealing with high-dimensional data for which prior knowledge about relationships between predictors and/or outcomes may be available.
The ISULO project has provided innovative statistical approaches addressing biological questions that arise both in medicine and agronomy. The results have highlighted the advantage of integrating known structures between covariates, between responses or between covariates and responses into the statistical models. They have also pointed out the difficulties/limitations that may be encountered when jointly modelling multivariate outcomes with multiple -omic datasets. The interdisciplinary nature of the project has allowed exploration of various biological and statistical areas and high-collaboration with researchers in statistics, genetics, epidemiology, molecular biology, and oncology. The wide applicability of the proposed approaches as well as the expertise acquired in the analysis of data generated by cutting-edge technologies have led to new collaborations with national and international researchers working in different domains. By addressing challenging methodological questions, this action has also generated new interdisciplinary questions that will be investigated in future projects.
They address biological and/or methodological questions that commonly arise from longitudinal and -omic data and for which there are no efficient methods available. The results on simulated and real datasets have shown better performance compared to existing approaches. They have also shown that the integration of dependence structure between and within predictors and outcomes leads to a better identification of discriminatory covariates and, so to a better understanding of complex biological mechanisms. This brings new opportunities for identifying diagnostic, prognostic, or therapeutic biomarker candidates in medicine, and for identifying genotypes most adapted to climate change in agronomy.
As most of the proposed approaches have been developed to address biological questions encountered in medicine and agronomy, the plan is to widely apply these approaches on various real datasets from the two domains.