European Commission logo
English English
CORDIS - EU research results

Finding noncoding cancer drivers

Periodic Reporting for period 5 - NONCODRIVERS (Finding noncoding cancer drivers)

Reporting period: 2022-12-01 to 2023-08-31

Tumors contain thousands of somatic mutations in their genomes. Most of these mutations are not involved in the disease, but few of them, which we call cancer driver mutations, are directly involved in the development of the tumor. Identifying these cancer driver mutations is key to understanding cancer biology and to progress towards personalized cancer medicine.

Most of the work on cancer research has been focused on the study of the coding genome, which comprises less than 2% of the genome sequence. This has allowed us to identify several hundreds of genes involved in tumorigenesis through mutations that affect their coding sequence. On the other hand, only a handful of non-coding driver elements have been identified to date. Our project aims to study what is the role of somatic mutations in the non-coding genome in cancer development.

To identify cancer driver genes and non-coding elements we apply methodologies to measure positive selection in the pattern of mutations. In order for this to be successful, it is key to be able to understand how mutations occur in our cell in order to estimate neutral mutagenesis. On the basis of a correct estimation of neutral mutagenesis, positive selection can be measured as a statistically significant deviation of the pattern of observed mutations across tumors to that estimated under neutral mutagenesis.

The estimation of the neutral mutagenesis is, however, not a trivial problem. Mutations occur randomly in the genomes of our cells, with every genome site having a different probability to mutate in each tissue and individual. This depends on the mutational processes that are active in those cells (e.g. UV light, tobacco mutagenesis..) the DNA repair machinery, and the interaction between mutagenic processes, the DNA sequence, features of the structure of the chromatin.

During this project we have gained a better understanding of the mutational processes that take place in tumors and normal cells. We have improved our methods to identify positive selection and we have systematically applied them to ever growing datasets of cancer genomes to generate a catalog of cancer driver genes and mutations along the genome.

The project also aimed to identify possible therapeutic interventions informed by the driver mutations detected. In this direction we have developed Cancer Genome Interpreter (CGI), which includes a database and algorithm to identify therapeutic opportunities for patients based on their tumor mutations. (
We have developed novel computational methodologies able to identify genomic elements, in coding and non-coding regions of the genome, with cancer driver mutations. These methods exploit the principles of Darwinian evolution of tumors. Briefly, genomic elements with cancer driver mutations during tumorigenesis present mutational profiles that deviate in several respects from the expected distribution of mutations under neutrality. We call these deviations signals of positive selection, and we have developed several methods to identify them in the observed mutational patterns of genomic elements across cohorts of tumors.

We have also collected data from thousands of tumor whole-genomes from different sources.

As part of the analysis of mutations in non-coding regions we have discovered that the rate at which mutations accumulate in different regions of the genome is highly variable at the local level. We have advanced significantly in understanding the reasons for these variability in terms of accumulation of DNA damage and the activity of DNA repair along the genome. These are important basic biology results and are also important to accurately identify cancer driver mutations.

The project is mostly a basic research project and most of the findings represent advances in our knowledge of how mutations occur and which ones are cancer drivers. These advances have a clear potential to impact society in the longer term. The most direct impact of the project to society is the development of Cancer Genome Interpreter (CGI), as this is a tool that is used in research centers and hospitals for the interpretation of variants identified in the tumors of individual patients in the clinical setting. CGI uses the results of IntOGen (a pipeline combining methods to identify cancer driver genes) and BoostDM (a collection of machine learning-based models that detect driver mutations in cancer genes) to annotate and interpret mutations that otherwise are categorized as mutations of uncertain significance. The development of CGI-clinics project aims to further develop CGI for clinical use. This is a real impact into society which is currently occurring.

All the tools developed or improved in the course of the project are available online:

We published youtube videos of CancerGenomeInterpreter and BoostDM to further disseminate them:
We will improve our understanding of the local variability of DNA damage, DNA repair and mutation rates along the genome.

We will improve the computational methodologies to identify cancer driver mutations by incorporating the knowledge obtained about the local variability on the accumulation of mutations in the calculation of the background mutation rate.

We continue the analyses of tumor whole-genomes with the newly collected tumor whole-genomes and the new methods.

We will validate functionally the most promising novel candidate non-coding cancer driver mutations.
Logo BBGLab