Periodic Reporting for period 2 - SOUND (Statistical multi-Omics UNDerstanding of Patient Samples) Reporting period: 2017-03-01 to 2018-08-31 Summary of the context and overall objectives of the project Our objective has been to enable European and international researchers to increase statistically informed use of personal multi-omic data for biomedical research. The bioinformatic and statistical analysis of the large, heterogeneous and complex data sets that are produced in such research projects is often the biggest bottleneck in many genomic medicine projects. Currently and in the foreseeable future, studies are being performed that apply multiple types of ‘omic technologies to hundreds or thousands of patient-derived samples, with the three-stage goal of better understanding of disease biology, discovery of new interventions, and personalizing the choice of treatment options. The approach taken by the SOUND consortium has been to assemble a team of biostatisticians, bioinformaticians, software developers and physician-scientists has led to the development of a unique set of new statistical methods and tools for data exploration that tackle urgent current scientific research questions and that have proven their worth under real-life conditions. SOUND has substantiated how future-oriented open programming environments enable scientific domain experts and software users, who are actively working in biomedical research to become software developers and to rapidly write functional prototypes and ‘good enough for the task’ software. A substantial body of software (incl. online user interfaces) and knowledge (in the form of reports, workflow and peer-reviewed scientific publications) was created by the SOUND consortium, and have already begun to serve as resource for medical researchers and clinical decision makers in personalised medicine. Moreover, the scientific approaches used here have been successful and will serve as a basis and reference for future scientific projects. Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far SOUND was closely aligned with the international Bioconductor project and contributed specific application modules (called ‘packages in the R programming language) as well as more general infrastructure to the project. The consortium produced several tools and resources that support a collaborative international developer and user community in academia and industry. One example is Renjin, an implementation of R that runs on the Java Virtual Machine (JVM) and offers completely new possibilities of integrating R code with enterprise software infrastructure. Several tools were developed for use in clinical settings -both in research and care provision-, including VarBench (for comparison of somatic variant callers for targeted deep sequencing data); VarMiss (an R package that complements VarBench by implementing a statistical framework to estimate the number of single nucleotide variants missed by variant calling methods); DrugScreenExplorer (for efficient and reproducible exploration of high-throughput drug perturbation assay data); TCGABrowser (a web-tool to help scientists explore the effects of cancer mutations and conduct comparative Kaplan-Meier survival analyses); NetICS (a graph diffusion-based method that prioritizes cancer genes by integrating multiple molecular data types on a directed functional interaction network); SOUNDBoard (for rapid and facile development of interactive reports, providing a server-based mechanism for displaying reports at levels of summary and visual sophistication appropriate for use cases in medical research); and OUTRIDER (for detection of expression outliers and automatic correction for confounders in transcriptome data by using an autoencoder). These tools demonstrated their utility in tests by clinical collaborators from inside and outside the SOUND consortium. They are freely available online as open source software, via Bioconductor or GitHub. Further activities by the SOUND consortium comprised a research exchange program, three annual summer schools (the so-called CSAMA courses), three annual meetings of the Bioconductor project that were the biggest in Europe, and several further conferences and training activities. These have successfully contributed to the aim of developing and making outstanding software tools and associated necessary knowledge available to a community of European and international researchers. The Exploitation of the SOUND consortium’s output has been boosted by testing the tools directly at clinical research partner institutions within and outside the consortium. Moreover, their availability and impact has been maximized by the fact that they are open source, well-documented and freely available. Thus, they have already now been serving a community of experts, clinical decision makers, medical researchers, and software developers. SOUND results will also be valuable as basis for new scientific projects. The training activities of the SOUND consortium were effective instruments to disseminate the research results and solutions produced by the consortium to the community. Project results were disseminated at more than 50 conferences and workshops, and researchers funded through the SOUND consortium contributed outstanding lectures, courses and talks at research and education institutions worldwide. The project has yielded a significant list of publications (see project website). The research, knowledge dissemination and training activities that were performed in the SOUND consortium were also a major driver for the creation of the textbook Modern Statistics for Modern Biology by Susan Holmes and Wolfgang Huber, which is published by Cambridge University Press. Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far) As it was at the outset of the SOUND project, also after its end, Bioconductor remains one of the most widely used software systems for analysis of (gen)omic data with over 26,000 full text mentions in Pubmed Central. A major theme for the SOUND project was to exploit the enormous potential in increasing the reach of Bioconductor to translational and clinical research, and even into clinical care. SOUND made impressive progress along this way, with beneficial implications for patients across Europe. Almost every European citizen will at one point in their life encounter a medical situation in which personalized medicine, i.e. rational, biology-based diagnosis and treatment choice, could have an existential effect on them - if available. The speed at which we implement this transition and make sure such diagnoses and choices available, will affect thousands of lives. There is an associated market for new products and services, e.g. clinical laboratory characterization of samples upon diagnosis, or continuous monitoring of patients under risk for preventative medicine, that is worth billions of Euros. There is also a large potential for savings of costs and suffering in the health system, as treatments that are useless for a particular individual can be avoided and resources can be allocated more precisely. Given the scale of SOUND, and that of the above-mentioned challenges, it is clear that SOUND could only make a small, but hopefully effective contribution. Thus, we aimed at providing well-chosen, critical building blocks that, we hope, will now be further built upon by clinician-scientist led research groups, computational scientists with high-powered expertise in areas such as the mining of big data, mathematical modelling and rigorous statistical analysis, and small and medium enterprises (SMEs) that can realize business opportunities, but do not have the research resources to develop mathematically and computationally complex tools themselves.