Periodic Reporting for period 2 - MLFPM2018 (Machine Learning Frontiers in Precision Medicine)
Reporting period: 2021-01-01 to 2024-03-31
1. C. Cervia-Hasler et al. Persistent complement dysregulation with signs of thromboinflammation in active Long Covid. Science 2024. 383: eadg7942.
2. J. Bordes et al. Automatically annotated motion tracking identifies a distinct social behavioral profile following chronic social defeat stress. Nature Communications 2023; 14: 4319
3. A. Hawkins-Hooker et al. Getting personal with epigenetics: towards individual-specific epigenomic imputation with machine learning. Nature Communications 2023, 14(1)
4. G. Visonà, et al.Multimodal learning in clinical proteomics: enhancing antimicrobial resistance prediction models with chemical information, Bioinformatics 2023, 39(12):btad717
Notably, the work by Visonà et al. is the result of the MLFPM Retreat that took place at the University of Tartu in June 2022. There, the ESRs conducted a hackathon on antimicrobial resistance, where they a team of four of them applied multimodal learning in clinical proteomics and improved antimicrobial resistance prediction from mass spectra by sharing information across drugs.
In addition, MLFPM work has been presented in the form of posters and oral presentations at international workshops and conference (ESR3 at the ISCB42, in June 2021, ESR7 at the 2022 ASHG meeting, ESR9 at ISMB/ECCB 2021, ESR10 at NeurIPS 2021 and the 8th MCAA meeting, ESR12 at EMGM2021, and ESR14 at the 2021 and 2022 AMIA symposia. Notably, ESR9 has won the ‘Best Paper Award’ at CMSB 2023 and ESR12 has been selected as ‘Best Young Scientist bioinformatics long talk’ at EMGM2021.
In total, MLFPM results have been published in 42 papers in peer-reviewed journals, among them Science, Nature Communications, Bioinformatics, PLOS ONE and Briefings in Bioinformatics.
In the course of MLFPM, we have organised three very successful summer schools (one in person event and two virtual ones), as well as the three symposia/conference (one virtual, two in person); https://mlfpm.eu/events/(opens in new window). At all of these events, we were able to feature very reknown experts in the field and advertise the achievements of the network. The events and the recordings of the talks have been met by great interest from the scientific community.
Finally, the ESRs have also been very active in the communication of their work to the public. For example, they participated in the ETH open day “Scientifica 2021”, the European science fair “Science is Wonderful! 2021”, the “my PhD in 180 seconds” competition, Skype a Scientist, La Noche Europea de los Investigadores 2021 and Copenhagen's Culture Night.
* How to make disease risk predictions on hundreds of thousands if not millions of individuals?
* How to find the subsets of most relevant features and take potential non-linear interactions into account?
* How to make any statement about statistical significance of the findings when mining ultra high-dimensional spaces with millions of candidate features combinations?
* How to deal with massively missing data, if half of all data values are unknown?
* How to find causal relationships between features in ultra-high-dimensional cases?
* How to interpret or visualise the results when mining for non-linear higher-dimensional feature combinations?
* How to account for confounding in machine learning on population-scale data?
It is our mission of the “Machine Learning Frontiers in Precision Medicine” ETN to explore these and similar questions. We pursue two central aspects that are fundamental to reach the goal of Precision Medicine:
1. Enabling population-scale genetics through machine learning, i.e. developing machine learning methods that can detect disease-associated patient features in datasets with – ultimately – millions of features for millions of patients. These methods will increase our understanding of the factors involved in diseases and our ability to predict patient phenotypes, in particular disease risks, from their features. (Research Goal 1)
2. Enabling improved medical decision support through health record mining, i.e. developing machine learning methods that can analyse, model and predict the health trajectory, disease progression and therapy potential based on long-term, high-dimensional longitudinal health data for each individual. (Research Goal 2)
Each individual research project has been dedicated to one of the research goals. In particular, ESRs 5, 7, 8, and 14 are developing machine learning techniques for population-level phenotype prediction. ESRs 1, 3, 9, and 12 have been working on disease network discovery, and ESRs 2, 4, 6, 10, 11, and 13 have been working on topics related to health record mining using machine learning.
By applying their developed techniques to SARS-CoV-2, several ESRs have been able to contribute to a better understanding of the disease.