Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

Machine Learning Frontiers in Precision Medicine

Periodic Reporting for period 2 - MLFPM2018 (Machine Learning Frontiers in Precision Medicine)

Reporting period: 2021-01-01 to 2024-03-31

Healthcare is entering the digital era: More and more patient data, from the molecular level of genome sequences to the level of image phenotypes and health history, are available in digital form. Exploring this big health data promises to reveal new insights into disease mechanisms and therapy outcomes. Ultimately, the goal is to exploit these insights for Precision Medicine, which hopes to offer personalized preventive care and therapy selection for each patient. A technology with transformational potential in analysing this health data is Machine Learning. Machine Learning strives to discover new knowledge in form of statistical dependencies in large datasets. Mining health data is, however, not a simple direct application of established machine learning techniques. On the contrary, the emerging population-scale and ultra-high dimensionality of health data creates the need to develop Machine Learning algorithms that can successfully operate at this scale. Overcoming these frontiers in Machine Learning is key to making the vision of Precision Medicine a reality. To meet this challenge, Europe urgently needs a new generation of scientists with knowledge in both machine learning and in health data analysis, who are extremely rare at a global scale. The goal of the “Machine Learning Frontiers in Precision Medicine” ETN is to close this gap, by bringing together leading European research institutes in Machine Learning and Statistical Genetics, both from the private and public sector, to train 14 early stage researchers. These scientists will help to shape the future of this important topic and increase Europe’s competitiveness in this domain, which will have severe academic and industrial impact in the future and has the potential to shape the healthcare and high tech sector in Europe in the 21st century.
The biggest scientific achievements of the second reporting period are following peer-reviewed publications:
1. C. Cervia-Hasler et al. Persistent complement dysregulation with signs of thromboinflammation in active Long Covid. Science 2024. 383: eadg7942.
2. J. Bordes et al. Automatically annotated motion tracking identifies a distinct social behavioral profile following chronic social defeat stress. Nature Communications 2023; 14: 4319
3. A. Hawkins-Hooker et al. Getting personal with epigenetics: towards individual-specific epigenomic imputation with machine learning. Nature Communications 2023, 14(1)
4. G. Visonà, et al.Multimodal learning in clinical proteomics: enhancing antimicrobial resistance prediction models with chemical information, Bioinformatics 2023, 39(12):btad717
Notably, the work by Visonà et al. is the result of the MLFPM Retreat that took place at the University of Tartu in June 2022. There, the ESRs conducted a hackathon on antimicrobial resistance, where they a team of four of them applied multimodal learning in clinical proteomics and improved antimicrobial resistance prediction from mass spectra by sharing information across drugs.

In addition, MLFPM work has been presented in the form of posters and oral presentations at international workshops and conference (ESR3 at the ISCB42, in June 2021, ESR7 at the 2022 ASHG meeting, ESR9 at ISMB/ECCB 2021, ESR10 at NeurIPS 2021 and the 8th MCAA meeting, ESR12 at EMGM2021, and ESR14 at the 2021 and 2022 AMIA symposia. Notably, ESR9 has won the ‘Best Paper Award’ at CMSB 2023 and ESR12 has been selected as ‘Best Young Scientist bioinformatics long talk’ at EMGM2021.

In total, MLFPM results have been published in 42 papers in peer-reviewed journals, among them Science, Nature Communications, Bioinformatics, PLOS ONE and Briefings in Bioinformatics.

In the course of MLFPM, we have organised three very successful summer schools (one in person event and two virtual ones), as well as the three symposia/conference (one virtual, two in person); https://mlfpm.eu/events/(opens in new window). At all of these events, we were able to feature very reknown experts in the field and advertise the achievements of the network. The events and the recordings of the talks have been met by great interest from the scientific community.

Finally, the ESRs have also been very active in the communication of their work to the public. For example, they participated in the ETH open day “Scientifica 2021”, the European science fair “Science is Wonderful! 2021”, the “my PhD in 180 seconds” competition, Skype a Scientist, La Noche Europea de los Investigadores 2021 and Copenhagen's Culture Night.
Being able to link human genetic variation with phenotypic traits at population scale presents enormous opportunities for improving healthcare and the understanding of disease mechanisms. It may lay the foundation to Precision Medicine, the vision to tailor medical treatment to the molecular properties and health history of each patient. Key to turning this vision into a reality promises to be Machine Learning, the discipline of finding statistical dependencies in large datasets. Still, the population-scale size and ultra-high-dimensionality of emerging health data brings about questions that often cannot be directly answered by current machine learning approaches, but necessitate further method development:
* How to make disease risk predictions on hundreds of thousands if not millions of individuals?
* How to find the subsets of most relevant features and take potential non-linear interactions into account?
* How to make any statement about statistical significance of the findings when mining ultra high-dimensional spaces with millions of candidate features combinations?
* How to deal with massively missing data, if half of all data values are unknown?
* How to find causal relationships between features in ultra-high-dimensional cases?
* How to interpret or visualise the results when mining for non-linear higher-dimensional feature combinations?
* How to account for confounding in machine learning on population-scale data?

It is our mission of the “Machine Learning Frontiers in Precision Medicine” ETN to explore these and similar questions. We pursue two central aspects that are fundamental to reach the goal of Precision Medicine:
1. Enabling population-scale genetics through machine learning, i.e. developing machine learning methods that can detect disease-associated patient features in datasets with – ultimately – millions of features for millions of patients. These methods will increase our understanding of the factors involved in diseases and our ability to predict patient phenotypes, in particular disease risks, from their features. (Research Goal 1)
2. Enabling improved medical decision support through health record mining, i.e. developing machine learning methods that can analyse, model and predict the health trajectory, disease progression and therapy potential based on long-term, high-dimensional longitudinal health data for each individual. (Research Goal 2)

Each individual research project has been dedicated to one of the research goals. In particular, ESRs 5, 7, 8, and 14 are developing machine learning techniques for population-level phenotype prediction. ESRs 1, 3, 9, and 12 have been working on disease network discovery, and ESRs 2, 4, 6, 10, 11, and 13 have been working on topics related to health record mining using machine learning.

By applying their developed techniques to SARS-CoV-2, several ESRs have been able to contribute to a better understanding of the disease.
MLFPM logo