Periodic Reporting for period 5 - NONCODRIVERS (Finding noncoding cancer drivers)
Período documentado: 2022-12-01 hasta 2023-08-31
Most of the work on cancer research has been focused on the study of the coding genome, which comprises less than 2% of the genome sequence. This has allowed us to identify several hundreds of genes involved in tumorigenesis through mutations that affect their coding sequence. On the other hand, only a handful of non-coding driver elements have been identified to date. Our project aims to study what is the role of somatic mutations in the non-coding genome in cancer development.
To identify cancer driver genes and non-coding elements we apply methodologies to measure positive selection in the pattern of mutations. In order for this to be successful, it is key to be able to understand how mutations occur in our cell in order to estimate neutral mutagenesis. On the basis of a correct estimation of neutral mutagenesis, positive selection can be measured as a statistically significant deviation of the pattern of observed mutations across tumors to that estimated under neutral mutagenesis.
The estimation of the neutral mutagenesis is, however, not a trivial problem. Mutations occur randomly in the genomes of our cells, with every genome site having a different probability to mutate in each tissue and individual. This depends on the mutational processes that are active in those cells (e.g. UV light, tobacco mutagenesis..) the DNA repair machinery, and the interaction between mutagenic processes, the DNA sequence, features of the structure of the chromatin.
During this project we have gained a better understanding of the mutational processes that take place in tumors and normal cells. We have improved our methods to identify positive selection and we have systematically applied them to ever growing datasets of cancer genomes to generate a catalog of cancer driver genes and mutations along the genome.
The project also aimed to identify possible therapeutic interventions informed by the driver mutations detected. In this direction we have developed Cancer Genome Interpreter (CGI), which includes a database and algorithm to identify therapeutic opportunities for patients based on their tumor mutations. (https://www.cancergenomeinterpreter.org)
We have also collected data from thousands of tumor whole-genomes from different sources.
As part of the analysis of mutations in non-coding regions we have discovered that the rate at which mutations accumulate in different regions of the genome is highly variable at the local level. We have advanced significantly in understanding the reasons for these variability in terms of accumulation of DNA damage and the activity of DNA repair along the genome. These are important basic biology results and are also important to accurately identify cancer driver mutations.
The project is mostly a basic research project and most of the findings represent advances in our knowledge of how mutations occur and which ones are cancer drivers. These advances have a clear potential to impact society in the longer term. The most direct impact of the project to society is the development of Cancer Genome Interpreter (CGI), as this is a tool that is used in research centers and hospitals for the interpretation of variants identified in the tumors of individual patients in the clinical setting. CGI uses the results of IntOGen (a pipeline combining methods to identify cancer driver genes) and BoostDM (a collection of machine learning-based models that detect driver mutations in cancer genes) to annotate and interpret mutations that otherwise are categorized as mutations of uncertain significance. The development of CGI-clinics project aims to further develop CGI for clinical use. This is a real impact into society which is currently occurring.
All the tools developed or improved in the course of the project are available online:
CancerGenomeInterpreter: https://www.cancergenomeinterpreter.org
IntoGen: https://www.intogen.org/
BoostDM: https://www.intogen.org/boostdm
We published youtube videos of CancerGenomeInterpreter and BoostDM to further disseminate them:
https://www.youtube.com/watch?v=6Exe78fgNrk
https://www.youtube.com/watch?v=1Nq_rm_yudk
We will improve the computational methodologies to identify cancer driver mutations by incorporating the knowledge obtained about the local variability on the accumulation of mutations in the calculation of the background mutation rate.
We continue the analyses of tumor whole-genomes with the newly collected tumor whole-genomes and the new methods.
We will validate functionally the most promising novel candidate non-coding cancer driver mutations.