Skip to main content

Machine Learning Frontiers in Precision Medicine

Periodic Reporting for period 1 - MLFPM2018 (Machine Learning Frontiers in Precision Medicine)

Reporting period: 2019-01-01 to 2020-12-31

Healthcare is entering the digital era: More and more patient data, from the molecular level of genome sequences to the level of image phenotypes and health history, are available in digital form. Exploring this big health data promises to reveal new insights into disease mechanisms and therapy outcomes. Ultimately, the goal is to exploit these insights for Precision Medicine, which hopes to offer personalized preventive care and therapy selection for each patient. A technology with transformational potential in analysing this health data is Machine Learning. Machine Learning strives to discover new knowledge in form of statistical dependencies in large datasets. Mining health data is, however, not a simple direct application of established machine learning techniques. On the contrary, the emerging population-scale and ultra-high dimensionality of health data creates the need to develop Machine Learning algorithms that can successfully operate at this scale. Overcoming these frontiers in Machine Learning is key to making the vision of Precision Medicine a reality. To meet this challenge, Europe urgently needs a new generation of scientists with knowledge in both machine learning and in health data analysis, who are extremely rare at a global scale. The goal of the “Machine Learning Frontiers in Precision Medicine” ETN is to close this gap, by bringing together leading European research institutes in Machine Learning and Statistical Genetics, both from the private and public sector, to train 14 early stage researchers. These scientists will help to shape the future of this important topic and increase Europe’s competitiveness in this domain, which will have severe academic and industrial impact in the future and has the potential to shape the healthcare and high tech sector in Europe in the 21st century.
The biggest scientific achievements of the first reporting period are three peer-reviewed publications
1. Duroux D et al. (2020). Network Aggregation to Enhance Results Derived from Multiple Analytics. In Artificial Intelligence Applications and Innovations. AIAI 2020. IFIP Advances in Information and Communication Technology, vol 583. Springer, Cham. http://doi.org/10.1007/978-3-030-49161-1_12
2. Chorev M et al. (2020). The Case of Missed Cancers: Applying AI as a Radiologist’s Safety Net. In: Martel A.L. et al. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science, vol 12266. Springer, Cham. http://doi.org/10.5281/zenodo.4076798
3. Muzio G et al. (2020). Biological network analysis with deep learning. Briefings in Bioinformatics 2020, bbaa257. https://doi.org/10.1093/bib/bbaa257

as well as 6 oral presentations at international workshops and conferences (ESR 1 at IGES 2019 and AIAI 2020, ESR 4 at PyCon Argentina, ESR 5 at ISCB Student Council, ESR 7 at European Mathematics Genetic Meeting 2020 and ISCB41 Krakow 2020).

In the first reporting period (January 2019 – December 2020), 12 of the 17 planned scientific milestones, and 5 of the 6 scientific deliverables have been completed. Five milestones and one deliverable were unfortunately delayed mainly due to late recruitment of the ESR in charge. All those postponed tasks will have been completed by September 2021 at the very latest.

In 2019 and 2020, we organized two very successful summer schools. The first summer school “Machine Learning Frontiers in Precision Medicine” took place on 9.-13.09.2019 in Basel, Switzerland, and was very well-received by the community with around 100 participants (https://mlfpm.eu/1-summer-school/). Among other renowned experts, the external speakers included Lena Maier-Hein (German Cancer Research Center (DKFZ)), Lili Milani (Estonian Genome Center) and Nikolaus Rajewsky (Max Delbrück Center Berlin). Due to the second summer school was organized as a virtual event invent on September 21-23, 2020 (https://mlfpm.eu/2nd-summer-school/). This new format including a free, public livestream on YouTube as well as permanent access to the recordings of all talks allowed to open the event to the whole world, with up to 200 live attendees at times, and continued interest for the recordings. We were able to win exceptionally renowned experts as external speakers for this event, including Ewan Birney (Deputy Director General of EMBL, Director of EMBL-EBI), Mihaela van der Schaar (University of Cambridge & UCLA) and Aviv Regev (Broad Institute & Genentech).
Being able to link human genetic variation with phenotypic traits at population scale presents enormous opportunities for improving healthcare and the understanding of disease mechanisms. It may lay the foundation to Precision Medicine, the vision to tailor medical treatment to the molecular properties and health history of each patient. Key to turning this vision into a reality promises to be Machine Learning, the discipline of finding statistical dependencies in large datasets. Still, the population-scale size and ultra-high-dimensionality of emerging health data brings about questions that often cannot be directly answered by current machine learning approaches, but necessitate further method development:
* How to make disease risk predictions on hundreds of thousands if not millions of individuals?
* How to find the subsets of most relevant features and take potential non-linear interactions into account?
* How to make any statement about statistical significance of the findings when mining ultra high-dimensional spaces with millions of candidate features combinations?
* How to deal with massively missing data, if half of all data values are unknown?
* How to find causal relationships between features in ultra-high-dimensional cases?
* How to interpret or visualise the results when mining for non-linear higher-dimensional feature combinations?
* How to account for confounding in machine learning on population-scale data?

It is our mission of the “Machine Learning Frontiers in Precision Medicine” ETN to explore these and similar questions. We pursue two central aspects that are fundamental to reach the goal of Precision Medicine:
1. Enabling population-scale genetics through machine learning, i.e. developing machine learning methods that can detect disease-associated patient features in datasets with – ultimately – millions of features for millions of patients. These methods will increase our understanding of the factors involved in diseases and our ability to predict patient phenotypes, in particular disease risks, from their features. (Research Goal 1)
2. Enabling improved medical decision support through health record mining, i.e. developing machine learning methods that can analyse, model and predict the health trajectory, disease progression and therapy potential based on long-term, high-dimensional longitudinal health data for each individual. (Research Goal 2)

Each individual research project is dedicated to one of the research goals. In particular, ESRs 5, 7, 8, and 14 are developing machine learning techniques for population-level phenotype prediction. ESRs 1, 3, 9, and 12 are working on disease network discovery, and ESRs 2, 4, 6, 10, 11, and 13 are working on topics related to health record mining using machine learning.
MLFPM logo