Periodic Reporting for period 1 - CONVO (Convolutional neural networks to reveal resistant phenotypes behind the complex genotypes of ovarian cancer)
Período documentado: 2022-05-01 hasta 2024-04-30
The main goal of this project was to analyze the impact of CNVs on the phenotype of patients with HGSOC using machine learning (ML) approaches on single-cell RNA sequencing (scRNAseq) data. Achieving high resolution was crucial, and for this purpose, scRNAseq data was utilized, enabling detailed characterization of heterogeneous cancer cell populations and allowing the inference of CNV profiles from transcriptomic data.
To begin, given the complexity of deep learning models, the first objective consisted on the development of a baseline method , such as a multivariate linear model, necessary to procure a controlled reference set of relationships between individual CNVs and transcriptomic changes. Once a reference framework was established for understanding how CNVs influence gene expression, the next objective was to detect which combinatorial CNV patterns were relevant to predict a cell’s complex phenotype using a more complex ML model. This involved developing models that could reconstruct CNV profiles from gene expression data and vice versa, ensuring reliable predictions and interpretations. The use of variational autoencoder (VAE) models played a key role here, as they effectively captured the complex relationships between CNVs and gene expression, enabling the identification of CNV patterns that were predictive of specific phenotypes. Another critical aspect was interpreting the latent space of the models to link CNV patterns with phenotypic traits of cells, enhancing the understanding of resistance mechanisms. By analyzing the latent representations generated by the VAE models, the project aimed to uncover secondary targets crucial in driving resistance to chemotherapy in HGSOC tumors.
Validation of the associations found in the VAE models was also crucial. For this purpose, organoids derived from patient tumors were analyzed to ensure that the findings were applicable in real-world settings. By integrating data from various sources and validating the results through multiple methods, the project aspired to translate its findings into practical therapeutic strategies.
Through these comprehensive objectives, the project sought to address the critical challenge of chemo-resistance in HGSOC, offering new paths for treatment and improving outcomes for patients affected by this aggressive cancer.
In the initial phase, a baseline method was developed to procure a controlled reference set of relationships between individual CNVs and transcriptomic changes. This involved creating a multivariate linear model trained using samples from 52 HGSOC patients. The baseline model aimed to identify linear relationships between CNVs and the expression of specific genes. Despite the complexity and high dimensionality of the data, this foundational work was crucial in establishing a reference framework for understanding CNV impacts.
Subsequently, more advanced models were developed to predict a cell’s complex phenotype. The use of VAE models was integral to this phase. Four VAE models were designed: one reconstructing CNV profiles from expression data, another reconstructing gene expression from CNVs, and two reference models (expression from expression and CNVs from CNVs). These models enabled reliable reconstruction of CNV profiles and gene expression, facilitating the detection of relevant CNV patterns predictive of specific phenotypes. Multiple sanity checks and novel improvements ensured the models' robustness, including implementing a new distribution in the generative model to account for the higher occurrence of neutral CNV values. To further enhance the project's scope, a new collaboration with researchers from Karolinska Institutet was stablished to use data produced with the DNTR-seq technique, providing scRNA-seq and scDNA-seq data from the same cells. This collaboration was instrumental in refining the models and validating their performance with real ground truth data.
The project also focused on identifying secondary targets of resistance-associated CNVs on the transcriptome. By interpreting the latent space of the VAE models, significant progress was made in linking CNV patterns with phenotypic traits, thus enhancing the understanding of resistance mechanisms in HGSOC tumors.
Validation efforts included an exhaustive analysis of organoids derived from five patients, confirming their genotypic and phenotypic resemblance to the original tumors. This work resulted in a scientific publication and established further collaborative projects. The organoid models were crucial for validating the findings from the VAE models, ensuring their applicability in real-world settings.