In the last decade tremendous progress has been made in the field of high-throughput sequencing, leading to a rapid decline in sequencing costs. It is now possible to sequence the genomes of hundreds of thousands of individuals to create a deep catalog of the bulk of human genetic variation. However, sequence information alone is not enough, we need to understand what the function of the variation in genetic sequences between individuals is. Assigning function to genetic variation is known as functional annotation. For genetic variation that occurs in protein coding sequences it is relatively straightforward to predict the effect of a genetic variant, however the vast majority of genetic variants (~97%) lies outside of protein coding regions (i.e. in non-coding regions). For non-coding variants the power of predicting the effect of a genetic variant drops effectively to zero. However, genome wide association studies have already shown that the majority of all loci significantly associated with human traits and diseases are found in these non-coding, supposedly regulatory regions.
One of the challenges with human genome sequencing is that basically generates a list of genetic variants that are unlinked. We all inherit for every chromosome one copy from our father and one from our mother. Genetic variants can lie on the paternal or maternal copy of a chromosome. Functional genetic variants will also affect expression on that same chromosome, also known as allele. When we can link non-coding genetic variants to genetic variants that are expressed we can determine more directly the effect on gene expression. Regions where genetic variants can be link to the same parental chromosome are called haplotypes. The overarching aim of this project was to develop novel technologies to resolve haplotypes to identify genetic variants that affect gene expression.
Understanding the effect of non-coding genetic variants in gene regulation is particularly important in complex human genetics, which studies traits that are influenced by multiple genetic loci. Improving our understanding of complex genetic traits will enable better prediction of disease risk. Genetic risk assessment is complicated by the fact that every individual harbors millions of genetic variants, of which only a subset affects phenotypic traits (e.g. height, blood pressure or cardiovascular disease). Precisely, because the vast majority of non-coding genetic variants is not functional, assigning function to genetic variants is far from trivial. We have used a combination of multiple genomics methods to assign function to non-coding genetic variants.
A better understanding of human genetics, for both coding and non-coding sequences can lead to improvements in genetic risk profiles that can be used to encourage people to make lifestyle choices that improve healthy living and aging by preventing the onset of disease.