Community Research and Development Information Service - CORDIS


SOUND Report Summary

Project ID: 633974
Funded under: H2020-EU.3.1.6.

Periodic Reporting for period 1 - SOUND (Statistical multi-Omics UNDerstanding of Patient Samples)

Reporting period: 2015-09-01 to 2017-02-28

Summary of the context and overall objectives of the project

The application of ‘omic profiling technologies to patient-derived samples promises to generate a better understanding of the biology underlying many currently difficult to treat diseases, to help us discover new therapeutic interventions, and to enable us to personalize the choice of the best treatment options for each patient. The data are large in their scale and complex in their nature, with multiple types of omic assays being applied to hundreds of thousands of patient samples. The objective of the SOUND project is to enable European and international researchers to dramatically increase statistically informed use of such data for personalized medicine. It addresses the biggest bottleneck in many genomic medicine projects: bioinformatic and statistical analysis. We are developing new statistical methods needed to optimally exploit disparate types of noisy, complex, high-dimensional data. We aim to provide tools both for data exploration, i.e. hypothesis generation and discovery, and for drawing robust conclusions and sound decision-making (statistical inference). The latter is particularly pertinent to clinical data. In personalized medicine, decisions need to be made for individuals or small groups of patients, and the level of certainty in the underlying data and inference models needs to be rigorously assessed and clearly presented to clinical decision makers.
A related objective of SOUND is to enable scientific domain experts and software users to become developers. While production-grade software is ultimately written by professional software developers, the current and future rapid pace of progress in bioinformatics is made possible by programming environments that allow scientific domain experts to rapidly write functional prototypes and ‘good enough for the task’ software rapidly. There is an urgent need for scientific reproducibility to translate claimed results into pharmaceutical products and clinical practice, as errors can be costly or calamitous. Indeed, reproducibility is a current challenge for the biomedical research community, as is evidenced by concerns among scientists, in industry and wider society. Our interdisciplinary team of biostatisticians, bioinformaticians, software developers and physician-scientists will build solutions to address the data analysis bottleneck with technically sound methods.

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

We closely aligned with the international Bioconductor project, and contributed specific application modules (called ‘packages’ in the R language) as well as more generic infrastructure to the project. We are also giving tight feedback and hands-on input to the project that help ensure that Bioconductor provides robust and workable solutions suitable to multi-omics applications.
In order to facilitate the query and analysis of a multi-omic drug screening dataset produced in the SOUND project, we created interactive data visualisation and exploration tools based on R, shiny and Bioconductor components. Each of the tools is specialized to the specific biological and medical questions of the project, and thus demonstrates how such tools can be built rapidly, cheaply and effectively directly by the subject matter scientist.
To help scientists explore the effects of cancer mutations, we wrote TCGAbrowser, a simple web-tool to conduct Kaplan-Meier survival analysis, differential gene expression and pathway analysis on the open source TCGA cancer genome datasets. TCGAbrowser is generalizable to any cancer type, is easy to use, and enables more precise control of parameters than any other currently available method.
The rDGIdb package provides a wrapper for the Drug Gene Interaction Database (DGIdb), an important resource for molecular tumour boards that aim to match patient specific mutation profiles to potential treatment options.
We were able to make the several tools produced within SOUND available using open-source solutions, publicly available resources and scientific publications, thus supporting a collaborative international academic and industry developer and user community. SOUND sponsored the week-long summer school Statistical Data Analysis for Genome Scale Biology Course (CSAMA), which offers training in quantitative methods and the computing skills needed by medical researchers to effectively deal with large data. The teaching materials of this course and many others are publicly available on the course’s and on the Bioconductor website.
SOUND supported the European Bioconductor Developers’ conference, a major training event for European bioinformaticians interested in writing and disseminating their algorithms as widely useful software packages, and in taking part in the Bioconductor community.
The workshop Clinical Bioinformatics as a Service was organized by SOUND partners and collaborators and provided external training to European and international researchers. The exchange program coordinated the visit of Postdocs and PhD fellows to other partner sites.

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

Bioconductor is currently one of the most widely used software systems for analysis of (gen)omic data. For instance, it has over 19,000 full text mentions in Pubmed Central. Currently, it is mostly used in basic science contexts. There is potential in increasing its reach to translational and clinical research, and even into clinical care. Moreover, new methodical challenges arise with the increasing prevalence of multi-omic characterisation of patient samples. Compared to extant approaches that focus on a single omic technology at a time, these challenges include mathematical and conceptual aspects as well as ease-of-use and performance of software; they had not been sufficiently addressed by the Bioconductor project.
The advance over the state of art provided by SOUND is, in a nutshell, to overcome these challenges and to exploit the above-mentioned potential.
Almost every European citizen will at one point in their life encounter a medical situation in which personalized medicine, i.e., rational, biology-based diagnosis and treatment choice, could have an existential effect on them. The speed at which we implement this transition will affect thousands of lives. There is an associated market for new products and services, e.g. clinical laboratory characterization of samples upon diagnosis, or continuous monitoring of patients under risk for preventative medicine, that is worth billions of Euros. There is also a large potential for savings of costs and suffering in the health system, as treatments that are useless for a particular individual can be avoided and resources can be allocated more precisely.
SOUND is not single-handedly providing all of this, but we aim to provide well-chosen, critical building blocks that are then further built upon by 1) Clinician-scientist led research groups with exciting clinical questions and access to relevant patient samples, but moderate computational expertise and instruments, 2) Computational scientists with high-powered expertise in areas such as the mining of big data, mathematical modelling and rigorous statistical analysis, but limited access to multi-omic patient data and limited biomedical expertise and 3) Small and medium enterprises (SMEs) that can realize business opportunities in this multi-billion euro market, but do not have the research resources to develop mathematically and computationally complex tools themselves.

Related information

Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top