Periodic Reporting for period 4 - HAP-PHEN (From haplotype to phenotype: a systems integration of allelic variation, chromatin state and 3D genome data)
Periodo di rendicontazione: 2020-03-01 al 2020-08-31
One of the challenges with human genome sequencing is that basically generates a list of genetic variants that are unlinked. We all inherit for every chromosome one copy from our father and one from our mother. Genetic variants can lie on the paternal or maternal copy of a chromosome. Functional genetic variants will also affect expression on that same chromosome, also known as allele. When we can link non-coding genetic variants to genetic variants that are expressed we can determine more directly the effect on gene expression. Regions where genetic variants can be link to the same parental chromosome are called haplotypes. The overarching aim of this project was to develop novel technologies to resolve haplotypes to identify genetic variants that affect gene expression.
Understanding the effect of non-coding genetic variants in gene regulation is particularly important in complex human genetics, which studies traits that are influenced by multiple genetic loci. Improving our understanding of complex genetic traits will enable better prediction of disease risk. Genetic risk assessment is complicated by the fact that every individual harbors millions of genetic variants, of which only a subset affects phenotypic traits (e.g. height, blood pressure or cardiovascular disease). Precisely, because the vast majority of non-coding genetic variants is not functional, assigning function to genetic variants is far from trivial. We have used a combination of multiple genomics methods to assign function to non-coding genetic variants.
A better understanding of human genetics, for both coding and non-coding sequences can lead to improvements in genetic risk profiles that can be used to encourage people to make lifestyle choices that improve healthy living and aging by preventing the onset of disease.
In addition to our human genetics analyses, our expertise on the analysis of the 3D genome enabled us to study cohesin biology in more detail. This has resulted in a number of high-profile papers that study the role of cohesin and cohesin interacting proteins on the organization of the genome. For instance, we showed that loss of the cohesin regulator WAPL results in longer loops (Haarhuis et al, 2017, Cell). Furthermore, we showed that mutations in the architectural protein CTCF resulted in a loss of all CTCF-anchored chromatin loops (Li at al. 2020, Nature). Finally, we have shown that dynamic cohesin is crucial to the regulation of cell-type specific genes (Liu et al. 2020, bioRxiv).
We are now planning to further understand how genetic variants located at a distance from the promoter of genes contribute to the regulation of these genes. Our work has shown that cohesin and the 3D genome plays a crucial role in this regulation. We aim to combine these two disciplines to better understand gene regulation in general and distal gene regulation in particular.
We are expanding our data analysis pipeline to include a statistical framework to identify functional non-coding variants. Our method is the first method to identify functional non-coding variants in single individuals at high throughput. At the moment it is still necessary to analyze large cohorts of individuals to identify putative functional genetic variants. However, when we can identify functional non-coding genetic variants in individuals this should open up possibilities to identify (non-coding) driver mutations also for rare genetic diseases caused by non-coding mutations, which by definition cannot be studied in large cohorts.