Skip to main content

Machine Learning for Personalized Medicine

Final Report Summary - MLPM2012 (Machine Learning for Personalized Medicine)

The idea of Personalized Medicine is to tailor drugs and therapies to individual patients (depending on the genetic characteristics). For example, patients with the same type and stage of cancer respond differently to the same treatment – some respond well, some show a weak response and others even have serious side effects. How can these differences be explained? One answer is: Each single patient has a different genetic constitution. Genes affect the way how drugs are processed in the body. One treatment does not fit all. It is worth to look into the genome and to tailor treatments to the individual needs, in order to avoid harmful side effects and ineffective therapies. The scientific question behind this is: How can one identify genetic regions that correspond to the success or failure of a certain therapy? As our DNA consists of billions of letters, this is like looking for a needle in a haystack. Here Machine Learning comes into play. It uses computational tools to analyze this huge amount of data with the goal to recognize patterns and statistical dependencies. The idea is to create efficient algorithms to analyze human DNA and search for patterns that correlate with the development of a disease or with a certain drug response. For machine learning in personalized medicine, scientists from very different fields have to work hand in hand and benefit from each other's expertise. Geneticists gather huge amounts of genetic data and explore biological processes, whereas computer scientists analyze the data and find patterns in the deep jungle of the human genome.

Here, the ITN “Machine Learning for Personalized Medicine” comes into play. Our network consists of 12 full partners from 7 countries worldwide, 10 academic institutions, 2 companies, and it is completed by 2 associated partners from renowned international companies, Roche and IBM. We recruited 14 enthusiastic young scientists, originating from 7 different countries worldwide and with fields of expertise ranging from Biomedicine to Software Engineering. Our mission was to train them in all relevant scientific fields, to prepare them for a successful international career and to promote networking, in order to find joint projects and fields of cooperation. The main scientific objectives were: (1) to develop machine learning techniques for finding molecular properties of patients that support disease diagnosis, prognosis and theranostics ("biomarker discovery"), (2) to advance clinical data integration and knowledge management by machine learning, (3) to develop machine-learning techniques for exploring the biological causes and mechanisms of diseases, and (4) to predict clinical phenotypes based on interactions between genetic, epigenetic and environmental factors.

The scientific work of the network was defined in form of two research goals: (A) Biomarker Discovery and (B) Exploration of the Molecular Basis of Disease and Therapy. Every fellow had to complete certain tasks in the course of the project. The researchers did great work, and all milestones were completed. In particular, they developed new machine learning approaches for biomarker discovery that can account for the large search space in -omics data analysis and that can cope with the inherent computational challenges and multiple testing problems [1,2], or that account for different scales of measurements between samples [3]. Also, our research contributed to the exploration of the role of signaling pathways in diverse types of cancer [4] and the discovery of the molecular mechanisms that explain the anti-cancer effect of the compound silvestrol [5]. All research results were published in renowned journals (e.g. Nature, Bioinformatics, Oncotarget) and at prestigious peer-reviewed conferences (e.g. KDD, ICML, NIPS). Moreover, most of the projects resulted in software packages that were made available as free open-source software. Researchers and medical practitioners are invited to use and modify these packages for their purposes.

A central part of the network was the training of the fellows. So we arranged three summer schools with over 70 participants each (2013 in Tübingen, Germany; 2014 in Paris, France; 2015 in Manchester, UK). They were open to external participants and consisted of a combination of scientific courses, complementary skills courses, invited talks and small by the ITN fellows. We also had two ITN retreats (a “Mini-Hackathon” at ETH Zürich in 2015 and a team working event at CIPF, Valencia, in 2016) and a Final ITN meeting in Munich in 2016. All these events gave the fellows a platform to present their excellent results and contributed to deepening the network.

The interdisciplinarity of the network was exploited by exposing all fellows to the three areas machine learning, genetics and industry. Every researcher conducted at least two secondments at partner nodes working on different disciplines. This did not only enhance the scientific exchange between the research fields as well as between academia and industry, but also established a network of contacts and possible cooperations.

We reached out to the public and the scientific community through many different channels. Our website was launched in July 2013. It has been created and managed by the ESRs. It was not only a source of information for our members, but aimed to reach a wide audience, scientists of related fields and the interested public. In addition, we used Twitter (@mlpm_itn; ca. 200 followers) and Facebook (; ca. 1500 followers) to spread our news to a wide audience. There we also published our promotional video

Our closing conference took place as satellite meeting of the European Human Genetics Conference 2016 in Barcelona, Spain. The meeting attracted over 60 scientists and gave a great platform to disseminate our research and advertise our network to the scientific community. It was moreover followed by a workshop at the ESHG conference that attracted more than 200 listeners (many medical practitioners and geneticists). Selected talks and lectures of MPLM events are freely available and open to all interested people on and YouTube.

All ITN fellows have regularly attended and participated in the main international conferences in their research areas. Moreover, each of them gave at least three outreach talks for different audiences. By this we could reach out to students, the general public, industry, doctors and scientists from our and other fields. The work of the ITN was also promoted at the 2016 AAAS Annual Meeting, the coordinator K. Borgwardt’s Krupp symposium, as well as in several newspaper articles.

[1] Llinares-López, Grimm, Bodenham, et al. Genome-wide detection of intervals of genetic heterogeneity associated with complex traits. Bioinformatics 2015; 31(12):i240-i249.
[2] Llinares-López, Sugiyama, Papaxanthos & Borgwardt. 2015. Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing. In Proc. of the 21th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD '15). ACM 2015, New York, 725-734.
[3] Jiao & Vert. The Kendall and Mallows Kernels for Permutations. In Proc. of the 32nd Int. Conf. on Machine Learning (ICML-15). 2015, 1935-1944
[4] Hidalgo, Cubuk, Amadoz, et al. High throughput estimation of functional cell activities reveals disease mechanisms and predicts relevant clinical outcomes. Oncotarget 2016; 8(3):5160-5178.
[5] Wolfe, Singh, Zhong, et al. RNA G-quadruplexes cause eIF4A-dependent oncogene translation in cancer. Nature 2014; 513:65–70.

Coordinator: Prof. Dr. Karsten Borgwardt (ETH Zürich), Email:
Project website: ** Facebook: ** Twitter: @mlpm_itn