Skip to main content

Personalized bioinformatics for global cancer susceptibility identification and clinical management

Periodic Reporting for period 2 - PanCanRisk (Personalized bioinformatics for global cancer susceptibility identification and clinical management)

Reporting period: 2017-01-01 to 2018-06-30

Cancer is the most common cause of death in economically developed countries, as nearly 3 million people are diagnosed with cancer in the EU every year. A significant proportion of the worldwide burden of cancer could be prevented through early detection and treatment. Several global initiatives to analyse cancer genomes have been launched over the last years, including the International Cancer Genome Consortium. Through these studies we have derived a comprehensive view of the mutational landscape of cancer, opening up new opportunities for immediate translational actions in the clinics. This knowledge has many implications for early cancer diagnosis and selection of optimal treatments. PanCanRisk developed and applied bioinformatics and statistics methods to fully exploit the growing collections of cancer datasets, to translate the diagnostic power of sequencing into day-to-day clinical practice, and to discover germline variants that affect the risk to develop cancer. Our are: i) to develop a computational platform for cancer risk tests using next generation sequencing, ii) to identify known and novel cancer risk variants, iii) to identify regulatory risk variants and characterize their function, iv) to develop sequencing panels of cancer predisposition genes for clinical diagnostics, and v) to develop and evaluate biomarkers of cancer susceptibility and clinical course facilitating cancer risk prediction.
IWP1 developed a distributed NGS diagnostics platform facilitating analysis of large-scale cancer cohorts as well as diagnostics for single cases in the clinics (eDiVApipeline). We developed novel methods for copy number variant detection (ClinCNV), identification of false positive variant calls (ABB), prioritization of causal risk variants (eDiVA modules Score and Prioritize) as well as a comprehensive disease knowledge database. eDiVA and all integrated methods support the analysis of germline risk variants as well as somatic cancer driver mutations and has been used to analyse and annotate cancer datasets used in WP2 to 5.
WP2 developed a novel rare variant association study (RVAS) test, called BATI, and a comprehensive platform for RVAS analysis (REWAS). We used REWAS and BATI to perform RVAS tests on germline data from multiple cancer types included in the ICGC and TCGA cohorts. Two novel risk genes for breast and colon cancer identified by REWAS could be replicated in work package 4.
WP3 developed and applied computational methods for the identification of regulatory variants, with a specific focus on identification of germline variants associated to cancer risk. We developed methods for i) identification and functional annotation of regulatory variants, ii) detecting effects of regulatory variants as e.g. gene expression changes, and iii) the integration of epigenetic and genetic features to identify regulatory links in the context of cancer. We applied our methods to data from the PCAWG consortium, providing WGS data for a cohort of 2834 cancer cases. We were a major contributor to the analysis of the gene expression data in this international effort, identifying new regulatory variants through expression quantitative trait loci mapping of both germline and somatic variants.
In WP4, we have gathered ~6,000 cancer cases and controls for the evaluation of cancer risk variant candidates identified in WP2-3. We analysed an evaluation cohort of 1000 bc and 1000 cc cases, (compared to 1000 controls). We could replicate 2 candidate risk genes for colon and breast cancer identified in WP2.
In WP5 we have implemented a CRISPR-Cas based protocol for generating knock-in clones in MCF-10A cells, which allows us to generate clones with candidate risk mutations of interest within 3 weeks without the need for pre-screening. We have generated knock-in clones for 2 candidate cancer risk variants, specifically one coding and one regulatory variant with possible association to bc.
WP6 developed and evaluated a panel of diagnostic markers, inclusive of sCDs, against both bc and cc, and developed IVD-grade immunoassays for these markers suitable for samples screening. We developed predictive models for the diagnosis by screening retrospective samples of both diseases. The diagnostic capacity of the panel was enhanced via the combination with genomic data. Significant diagnostic capability in cc cases has been achieved, which could potentially be developed into a diagnostic product for the early detection of this disease after a larger study is followed.
WP7 has been dedicated to management and coordination. We have developed audio-visual media, communicated results, organized meetings, TC and one symposium on ethical issues of clinical genome sequencing, We ensured that the development of the PanCanRisk project has at all times complied with the tenets of the national and EU regulations regarding ethical issues and privacy of data.
Cancer susceptibility variants have not been extensively studied at a genome wide scale. We aim to advance the knowledge on the landscape of susceptibility variants and genes across several cancers. We have the goal of mining the available pan cancer datasets and to replicate the resulting candidate genes in independent large cohorts of cancer patients. This effort requires new computational methods that are attuned for the analysis of large cohorts of patients, as well as for the application in clinical settings. We propose to advance the bioinformatics algorithms included in our computational diagnostics platform in multiple ways: a) new statistical methods for the identification of rare susceptibility variants, b) algorithms for the interrogation of regulatory variants, and c) integrative analysis of susceptibility variants with a broad range of molecular and clinical variables. We will furthermore advance the use of innovative genome engineering and sequencing based methods for the experimental validation of computational predictions. The proposal has also a substantial collaborative component to maximally exploit the pan-cancer data.
PanCanRisk has a clear bench-to-bedside component, in which we will engage in the bioinformatics discovery of the susceptibility variants and will also validate them in large cohorts to assess their exploitability as risk biomarkers. This will be complemented by the implementation of intuitive computational tools for the day-to-day clinical praxis. We will also evaluate the commercialization of innovations introduced in the software of the platform. This could be achieved by establishing service models or by direct sale of software to the clinical markets. Complementary to software commercialization, we expect to gain commercially relevant insights from the biomarker analysis.
As intended for this study, serum-based biomarker analysis has resulted in early findings, indicating that known biomarkers for one cancer type are applicable for the diagnosis of other tumor entities. This insight opens up the possibility to develop a common diagnostic panel combining genomic and sCD biomarkers.
In combination, these innovations have the potential to change the way cancer risk is assessed in the clinics, to significantly expand the set of known risk genes and pathways, and to provide a better understanding of their involvement in cancer susceptibility