Skip to main content

Gene expression level as a keystone to understanding gene duplication: evolutionary constraints, opportunities, and disease

Periodic Reporting for period 1 - DOUBLE EXPRESS (Gene expression level as a keystone to understanding gene duplication: evolutionary constraints, opportunities, and disease)

Reporting period: 2019-01-01 to 2020-06-30

Duplicate copies of genes are commonly found in the genomes of living things from bacteria to humans. Duplicate genes are important in disease, are a hugely important source of evolutionary novelty, and for many years we thought we understood them. We thought that duplicates were an 'extra' copy that can be modified or lost with no consequence. We thought that any gene, no matter its importance, had an equal chance to be duplicated, and that that duplicate copy is (at least initially) unimportant owing to its redundancy. We thought that a duplicate is a duplicate is a duplicate. In recent years evidence has accumulated challenging this view. Rather than being the result of an unbiased process, the genes that tend to duplicate are quickly evolving, non-essential genes, irrespective of current duplication status. Conversely, when the duplication is not of an individual gene, but of an entire genome (whole genome duplication; WGD) the patterns are flipped. After WGD many duplicates are lost, reverting to the original copy number. The genes that tend to retained are the slowly evolving, important genes. Furthermore, rather than being redundant copies, the WGD duplicates are both required, and disruption to them often results in disease. This striking difference between the products of the two alternative mechanisms of duplication requires an explanation.

In this project we are exploring and testing a hypothesis that different resolution of the evolutionary constraints imposed by the demands of gene expression (how high or low a gene is turned on) can explain these contrasting relationships. We are testing our idea that the opposing constraints on gene-by-gene duplications as compared to WGD channel these different sets of genes into remarkably different evolutionary trajectories. We propose a common mechanism of pathogenicity for many duplication events independent of the biochemical function of the encoded genes, namely competition for cellular resources.

With the availability of abundant high-quality genomics data, now is an opportune time to address these questions. This project is important because it aims to deepen our understanding of the genome, and uncover links between evolutionary patterns and human genetic disease. These are fundamental insights which can be applied across all areas of genomics.
So far in this project we have completed the following:

1. Analysed modern genomes to reconstruct the ancestral vertebrate genome structure. This reconstruction reveals many important aspects of vertebrate genome evolution including the number and timing of whole genome duplication events, chromosome fusion events, and the relationships between modern chromosomes. This enables the reliable inference of the relationships between genes, that has previously been a significant challenge, and which opens the possibility of further analysis.
2. Resolution of controversy regarding the animal tree of life. Knowing the correct tree of life is necessary to infer the evolution of important animal features such as the nervous system and the immune system.
3. Quantification of the impact of de novo gene evolution on the origin of genetic novelty.

Other work in progress includes analysis of gene expression evolution of WGD duplicates; investigation of the genomic and evolutionary patterns of WGD duplicates and single duplicates; investigation of the importance of intrinsic disorder (structural) in evolution of duplicate genes.
Our reconstruction of the ancestral vertebrate genome structure is based on an algorithm we developed and which goes beyond the state of the art (see figure). It also allows us to build a new gold-standard database of WGD duplicates, which will be useful for researchers worldwide.

Our work on the animal tree of life both resolves a controversial aspect of the tree and also provides an important advance to the methods of tree inference.

We further expect to develop a new paradigm to understand the relationship of gene duplication and expression and show how this can lead to a deeper understanding of human genetic disease.