Skip to main content

Evolutionary and functional analysis of polymorphic inversions in the human genome

Final Report Summary - INVFEST (Evolutionary and functional analysis of polymorphic inversions in the human genome)

In the last decade, the extraordinary degree of structural variation (SV) discovered in genomes has generated a renovated interest in these kinds of variants, especially in humans. However, most of the work has focused on copy number variants (CNVs), and despite the initial expectations on the potential effects of these changes, only a small proportion of the genetic risk of common and complex diseases has been identified. Inversions were one of the first types of SVs studied and for a long time they have been associated to phenotypic differences in diverse organisms. Nevertheless, their balanced nature and the presence in many cases of highly identical inverted repeats (IRs) at the breakpoints made the study of inversions especially challenging. Therefore, little was known about their population distribution and functional consequences in humans. In this project we have carried out the largest effort towards the complete characterization of human inversions to date.

First, by combining all available information and using bioinformatic and experimental methods, we have created the most reliable catalogue of inversions in the human genome, which comprises a few hundred non-redundant inversions. Specifically, we have developed new algorithms to identify as accurately as possible inversion breakpoints from paired-end mapping data and to merge the inversions predicted from different published studies. In addition, we have analyzed extensively the power to detect different types of inversions with paired-end mapping strategies. By sequence analysis and optimization of PCR-based techniques to validate inversions, including regular PCR, inverse PCR, haplotype-fusion PCR and the recent droplet-digital PCR, we have also studied in detail around 200 inversions. This has allowed us to determine that a big fraction of predicted inversions are false positives and to identify the main causes of these errors. Moreover, we have increased considerably the number of experimentally confirmed polymorphic inversions in humans, to up to more than 100. All this information is now easily accessible through the web-based and user-friendly InvFEST database, which aims to serve as a central repository for human inversion polymorphism.

Second, we have developed a new high-throughput assay for inversion genotyping, and we have used it to determine the distribution of 45 inversions in 550 individuals from 7 populations of the 1000 Genomes Project, which represents the most-complete population genetics study of human inversions so far. Inversion frequency spectrum showed considerable variation (MAF=0.5-49.7%), with a bias towards intermediate frequencies and significant differences among populations (Fst=0.01-0.49) in several cases. In particular, inversion distribution patterns are not consistent with a neutral scenario and suggest events of negative, positive or balancing selection. However, one of the main discoveries of the project is that IR-mediated inversions show an unexpectedly high degree of recurrence, with most of them occurring on different haplotypes in humans and showing also different orientations in chimpanzees and gorillas. This contrasts with inversions with simple breakpoints, which are unique and can be tagged by SNPs. Thus, our results illustrate the dynamic nature of the genome and emphasize the need of direct genotyping of inversions to assess their impact in phenotypic traits and human evolution.

Finally, the use of an accurate inversion set has made possible to check the effect of inversions in different functional elements of the genome. Although inversions tend to avoid gene coding sequences, and most are intergenic or intronic, we have identified several cases in which an inversion affects genes in different ways. An example is an East-Asian specific inversion that breaks a zinc-finger protein gene and creates a fusion transcript. By combining the genotypes with available expression data from lymphoblastoid cell lines from the same individuals, we have also found that in most cases inversions do not alter the expression of the nearby genes, but there are a handful that produce regulatory changes both in cis and in trans. Our integrative analysis represents a key step forward in defining the evolutionary and functional impact of these variants, and opens the possibility of determining their role in the genetic basis of complex traits and disease susceptibility in the near future.