Skip to main content
European Commission logo
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS

Characterizing gene regulation in single cells through integration of scRNA-seq and scATAC-seq data with generic multi-modal prior information

Periodic Reporting for period 1 - GReCS (Characterizing gene regulation in single cells through integration of scRNA-seq and scATAC-seq data with generic multi-modal prior information)

Período documentado: 2021-11-15 hasta 2023-11-14

The overarching goal of this project was to work towards a better understanding of how gene regulation varies across different cell types and states. The advent of single cell technologies has enabled the characterisation of cell states at a more granular level. In particular, scRNA- and scATAC-seq data have become more widely used and large atlases of gene expression and chromatin accessibility at single cell resolution are being assembled. As part of this project single cell RNA and ATAC datasets were used to infer gene regulatory networks (GRNs), consisting of transcription factors (TFs) and their target genes, as well as enhancers harbouring TF binding sites. Using the data at single cell resolution, these enhancer-GRNs were further analysed for differences between cell states.

As one part of the project, a reference dataset of transcription factor regulation had to be assembled using an extensive list of published datasets. A second part involved developing a method to identify relevant connections in a given network by integrating different modalities and cell type specific input data. Finally, single cell RNA and ATAC data were to be jointly used to infer gene regulatory networks in a new data analysis application. Overall, all objectives could be fulfilled, with the exception of minor changes due to new scientific developments since the start of the action.
An extensive reference atlas of cell type specific gene regulatory interactions covering 15 organ systems and more than 500 cell types has been created by processing large integrated cell atlas datasets from 12 publications. This gene regulatory atlas has been prepared as a database, CellRegulon DB, including a web-interface and python package for programmatic access. It can be used together with single cell or bulk query data to obtain predictions of which transcription factors in which cell types and tissues explain observed differences in gene expression. In addition, the regulon atlas itself was analysed to examine the activity of TFs across cell types and TF co-regulation. Querying the database has been demonstrated at the example of disease gene signatures (adult- and childhood-onset asthma), a single cell multiome lung cell atlas and data from TF overexpression experiments aiming to differentiate iPSCs into more specialised cell types.

As a new way to prioritise connections in a reference network, an approach based on network propagation has been developed. Heterogeneous, cell type specific data is passed through the network and significantly enriched nodes are identified. These could represent active transcription factors that drive gene expression and chromatin accessibility in a given cell type. To increase the scope of the problem, genomic scores derived from GWAS summary statistics have been tested as another type of input data. Mutations in complex diseases often affect enhancers. Therefore, when mapping summary statistics based scores onto the network and integrating it with cell type specific data, TFs which are likely to be functionally affected by those mutations can therefore be identified in a cell type specific manner. This approach has been developed and applied as part of a third, data analysis based line-of-work described next.

A new dataset of first trimester human skeletal development has been extensively analysed with a focus on gene regulation along cell differentiation trajectories and across anatomic locations. The dataset consists of more than 300k cell nuclei profiled with both RNA and ATAC sequencing and spans various time points between 5-11 pcw across 5 locations. Leveraging both modalities, enhancer-GRNs have been predicted for developmental trajectories including osteogenesis and chondrogenesis. Changes of TF activity were analysed and effects of TF perturbations were predicted, in particular for mutations known to cause craniosynostosis, a genetic condition with premature fusion of bone plates in the skull. In addition, GWAS summary statistics for hip osteoarthritis have been integrated with single cell data in a newly developed approach. Interestingly, enrichments of TFs involved in bone formation across osteogenic cell types were observed, pointing to a role in hip shape formation, which when altered may lead to disease later in life.

The results generated as part of this project were planned to be published in one or more scientific papers. Changes in the project plan due to new scientific developments and an increased scope of the work led to delays in the dissemination of the results. However, a manuscript covering the application of GRN inference to skeletal development is currently under revision, as well as a second manuscript combining GWAS summary statistics with single cell data. Finally, a manuscript about CellRegulon DB, a database of cell type specific reference-GRNs is going to be submitted soon. The project results have been and will be further disseminated in additional scientific meetings and through planned outreach activities.
Single cell research is a rapidly developing field, in which the Human Cell Atlas (HCA) consortium is spearheading efforts to build gene expression reference maps encompassing all human cell types. While substantial progress has been made in cell type classification, the regulatory interactions of transcription factors and genes across cell types remain less well explored. This project contributes to closing this gap by complementing gene expression based classification of cell types with an atlas of their predicted gene regulatory states, applying computational state-of-the-art methods to an extensive list of input datasets. This reference will aid researchers of various fields in the interpretation of human expression data, for example in obtaining predictions of disease-relevant TFs and cell types that can be further explored and validated.

In an analysis of a dataset of human early skeletal development that will be part of the HCA gene regulatory network analysis using RNA and ATAC data has been extensively applied and led to various new results. Single nuclei profiling and the analysis of a large number of samples of the developing skeleton allowed us to describe differentiation trajectories of bone cells, which is difficult due to the matrix-rich environment of those cells. Further, first trimester human skeletal development of the calvaria has been analysed for the first time with single cell resolution. Predictions of enhancer-GRNs have been made for numerous cell types throughout the atlas, making it a multimodal reference that can be used by researchers for further exploration. In addition, several new cell states have been characterised together with driving TFs, including the role of different TFs in diseases like craniosynostosis and in bone formation that may lead to osteoarthritis. A new computational approach that has been proposed in this context can be used to identify TFs that play cell type specific roles in traits for which GWAS summary statistics exist.

Overall, this work partially extends and will be partially included in the Human Cell Atlas, which is predicted to transform our understanding of biology and have a wide-reaching impact on future healthcare. The open availability of cell type resolved reference maps, will aid researchers world-wide in their mission to advance basic science, as well as companies in translating research into new drugs and medical innovations.
graphical abstract