Periodic Reporting for period 1 - MLFPM2018 (Machine Learning Frontiers in Precision Medicine)
Reporting period: 2019-01-01 to 2020-12-31
1. Duroux D et al. (2020). Network Aggregation to Enhance Results Derived from Multiple Analytics. In Artificial Intelligence Applications and Innovations. AIAI 2020. IFIP Advances in Information and Communication Technology, vol 583. Springer, Cham. http://doi.org/10.1007/978-3-030-49161-1_12
2. Chorev M et al. (2020). The Case of Missed Cancers: Applying AI as a Radiologist’s Safety Net. In: Martel A.L. et al. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science, vol 12266. Springer, Cham. http://doi.org/10.5281/zenodo.4076798
3. Muzio G et al. (2020). Biological network analysis with deep learning. Briefings in Bioinformatics 2020, bbaa257. https://doi.org/10.1093/bib/bbaa257
as well as 6 oral presentations at international workshops and conferences (ESR 1 at IGES 2019 and AIAI 2020, ESR 4 at PyCon Argentina, ESR 5 at ISCB Student Council, ESR 7 at European Mathematics Genetic Meeting 2020 and ISCB41 Krakow 2020).
In the first reporting period (January 2019 – December 2020), 12 of the 17 planned scientific milestones, and 5 of the 6 scientific deliverables have been completed. Five milestones and one deliverable were unfortunately delayed mainly due to late recruitment of the ESR in charge. All those postponed tasks will have been completed by September 2021 at the very latest.
In 2019 and 2020, we organized two very successful summer schools. The first summer school “Machine Learning Frontiers in Precision Medicine” took place on 9.-13.09.2019 in Basel, Switzerland, and was very well-received by the community with around 100 participants (https://mlfpm.eu/1-summer-school/). Among other renowned experts, the external speakers included Lena Maier-Hein (German Cancer Research Center (DKFZ)), Lili Milani (Estonian Genome Center) and Nikolaus Rajewsky (Max Delbrück Center Berlin). Due to the second summer school was organized as a virtual event invent on September 21-23, 2020 (https://mlfpm.eu/2nd-summer-school/). This new format including a free, public livestream on YouTube as well as permanent access to the recordings of all talks allowed to open the event to the whole world, with up to 200 live attendees at times, and continued interest for the recordings. We were able to win exceptionally renowned experts as external speakers for this event, including Ewan Birney (Deputy Director General of EMBL, Director of EMBL-EBI), Mihaela van der Schaar (University of Cambridge & UCLA) and Aviv Regev (Broad Institute & Genentech).
* How to make disease risk predictions on hundreds of thousands if not millions of individuals?
* How to find the subsets of most relevant features and take potential non-linear interactions into account?
* How to make any statement about statistical significance of the findings when mining ultra high-dimensional spaces with millions of candidate features combinations?
* How to deal with massively missing data, if half of all data values are unknown?
* How to find causal relationships between features in ultra-high-dimensional cases?
* How to interpret or visualise the results when mining for non-linear higher-dimensional feature combinations?
* How to account for confounding in machine learning on population-scale data?
It is our mission of the “Machine Learning Frontiers in Precision Medicine” ETN to explore these and similar questions. We pursue two central aspects that are fundamental to reach the goal of Precision Medicine:
1. Enabling population-scale genetics through machine learning, i.e. developing machine learning methods that can detect disease-associated patient features in datasets with – ultimately – millions of features for millions of patients. These methods will increase our understanding of the factors involved in diseases and our ability to predict patient phenotypes, in particular disease risks, from their features. (Research Goal 1)
2. Enabling improved medical decision support through health record mining, i.e. developing machine learning methods that can analyse, model and predict the health trajectory, disease progression and therapy potential based on long-term, high-dimensional longitudinal health data for each individual. (Research Goal 2)
Each individual research project is dedicated to one of the research goals. In particular, ESRs 5, 7, 8, and 14 are developing machine learning techniques for population-level phenotype prediction. ESRs 1, 3, 9, and 12 are working on disease network discovery, and ESRs 2, 4, 6, 10, 11, and 13 are working on topics related to health record mining using machine learning.