Periodic Reporting for period 1 - SIOMICS (SIngle-cell multi-OMICs approach to study intra-tumour heterogeneity of soft tissue Sarcomas) Reporting period: 2017-04-01 to 2019-03-31 Summary of the context and overall objectives of the project In today’s ageing populations, cancer - a degenerative disease - is one of the leading cause of death worldwide and arguably one of the biggest scientific and health challenges of our era. Our bodies are made of millions of billions of small specialised unit blocks called cells. Most cancers arise from one single aberrant cell that has acquired independence and immortality, and embarked on a suicidal mission to grow beyond its tissue constrains to ultimately invade the host body. Because there are so many cell types from which cancer can arise, there are also many different cancer types (e.g. colon, liver, ovarian, prostate, etc.). Sarcomas are cancers arising from the cells of the bone, cartilages and soft tissues (e.g. nerve, muscle) and are a mixed bag of 50-100 different cancer types under the same denomination. Because sarcoma is an uncommon cancer type as a whole, most of these 50-100 sarcoma subtypes are very rare.When we look across cancer genomes, we see recurrent events that distinguish them from most genomes of other “normal” cells in their respective hosts. These events are called somatic driver mutations, which help us understand the cancer’s evolution but also are potential handles for targeted treatments. We now know that the same drivers accumulate across all the “normal” healthy cells in our body, but typically only one or a few cells will lead to cancer. This somatic evolution and in particular somatic evolution leading to cancer is long known but, with the advent of sequencing, we are only starting to characterise it. Ten years ago, the first technologies allowing to sequence single cells have emerged and the field has been exponentially growing ever since. However, these technologies are still expensive, require complex logistics, and come with technical and computational hurdles. Because sarcomas are rare, they require long-term (inter)national efforts to be studied in as much depth as other common cancer types. While these big collection efforts are on going we propose to take a few patients and analyse them in depth using these emerging single-cell technologies. In particular, for this project I set up to study a malignant sarcoma of the soft tissue surrounding our nerves in one patient and perform genome and transcriptome (G&T) sequencing of the same cells. The complexity in the biological questions, the mathematical analyses and the logistics required efforts on various fronts, and I had the chance to participate in several studies where I developed and benchmarked methods to study cancer evolution at the bulk and single cell level; I studied the extent of selection in cancer evolution, and of intra-tumour heterogeneity and chromothripsis - a complex somatic change in the DNA particularly prevalent in sarcomas - across 37 cancer types including sarcomas; I helped characterised the evolutionary landscape of undifferentiated sarcomas; and I analysed single-cell data of sarcoma and leukaemia to better understand the effect of treatment. Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far Deriving accurate phylogenetic tumour trees, the equivalent of a genealogic tree for tumour cells, is an important prerequisite to the study of intra-tumour genetic heterogeneity (ITH) and tumour growth, the hallmark signature of on going cancer evolution. First, I set up to reconstruct high-quality phylogenetic trees from whole genome bulk sequencing and further refine the trees using single-cell sequencing data, where the co-occurrence of somatic mutational events help restrain the possible tree structures. To identify the best strategy to derive phylogenetic trees, I participated in two international efforts to benchmark existing phylogeny reconstruction or subclonal reconstruction methods. Copy number aberrations are a common type of genomic changes occurring during cancer evolution and a good handle to reconstruct ITH, especially at the single-cell resolution. Therefore, to further annotate the tree with single-cell copy number events, I developed my own package to derive copy number profiles from single-cell sequencing data. My method is generic and can be applied to other types of DNA-profiling technologies, such as methylation data, off-target reads from targeted sequencing data, or shallow coverage whole genome sequencing.Second, to model the relationship between genetic and epigenetic/transcriptomic subclones, I first looked at integrating genome, methylation, and transcriptomic profiles of undifferentiated sarcomas at the bulk level. Then, from the observation that there is a strong gene-dosage effect, i.e. on average a monotonic relationship between expression of the genes and their number of DNA copies, both at the bulk and single-cell levels, I proposed a strategy to infer copy-number profiles from the RNA profiles of single cells and supervised a student to use G&T data to train a machine learning algorithm. Then, I developed a method to call chromothripsis, a typical catastrophic event in sarcomas, and studied its impact on the driver landscape and their expression across cancer types, including sarcomas.Finally, I planned to study the evolution of cancers, especially in response to treatment. In a clinical setting, multiple strategies can be used to derive time series and study cancer evolution in response to treatment. Here, I first contributed to the literature on the interpretation of variant allele frequencies in terms of tumour evolution and tumour growth parameters. Second, from G&T pre- and post- treatment in cancer xenografts, I helped ask where selective pressure through treatment is active. Finally, I optimised the design to profile single cells pre- and post- treatment using G&T of the primary and recurrences of one sarcoma patient. Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far) I have disseminated my results on various occasions, to the scientific community in my field and to scientific audiences across other fields, as well as engaged with the public through broader media.The work during this action has not only had a great impact on the scientific field but also has implications for the clinics. Painting the driver landscape of rare cancers, and understanding cancer evolution, especially in response to treatment, are crucial steps for adequate treatment strategies. The methods to do so are complex and do not always converge to the same answers, which I properly benchmarked and for which I developed a robust consensus approach to combine their outputs. I participated in a lively debate around the concept of neutral evolution, challenging our understanding of how tumours grow, and helped the field progress further. I have also had the chance to be part of a project that uncovered the ordering of driver events and timed them in real time using molecular clocks, opening new opportunities for cancer prevention and improved treatments in different cancer types. I am fortunate to have supervised a brilliant clinical research fellow, whose work identified FOS/FOSB rearrangements as a specific driver of benign bone tumours, providing a diagnostic tool to inform treatment. I am still expecting this work to lead to 4 (co-)first author publications that will further impact the field.