Skip to main content

The speed of molecular evolution: rate shifts, gene function and natural selection in primate history

Final Report Summary - PRIMATE HETEROTACHY (The speed of molecular evolution: rate shifts, gene function and natural selection in primate history)

The goal of this project is to develop an automated pipeline to discover genes that have undergone strong directional selection in the primates. These genes define the selective processes acting on primate genomes that gave rise to ourselves and our nearest relatives.

The Primates, the Order we belong to, spans around 250 species that have diverged since approximately the K/T boundary 65 million years ago. In recent years, the genomes of representatives of nine major, diverse groups of this Order have been sequenced and annotated to various levels of detail. These data are publicly available and thus give researchers access to the genetic results of a grand natural experiment where species have evolved from a single common ancestor to occupy a wide variety of niches and morphological dimensions.

In the comparison of these genetic results we are able to infer the selective processes acting on primate genomes to give rise to ourselves and our nearest relatives. For each gene in each of the nine available genomes, we first determined whether it is shared with other species (i.e. homologous) and if so, we then looked for the signature of natural selection by comparing a gene in one species to its shared homologues in others. In cases where genes show strong directional natural selection (i.e. from a different ancestral state to a novel adaptive optimum), we determined the function of the gene by consulting protein-function databases. Finally, it was tested whether some categories of gene functions were over-represented, and also whether tissues in which these genes were expressed were over-represented.

We identified approximately 6000 genes orthologous to all nine species. Of those 6000 genes, 2346 show some evidence of natural selection altering their rate of evolution in the primates. These genes are involved in forebrain development, lifespan, programmed cell-death (apoptosis), eye development, and facets of male-male competition. One tantalizing finding is that genes involved in spermatogenesis (the production of sperm) have undergone strong selection in the Great Apes.

In addition to these preliminary results, this project has also spawned seven scientific publications, five novel open source software applications, new collaborations for speeding up similar analyses in the future, and a release of the primate “Phylome” (i.e. a set of genomes where homology - or the absence thereof - is assigned with reference to a backbone genome, in this case the human genome).

Work Progress and Achievements
To draw the kind of inferences as described in the project summary, an immense amount of data needs to be analysed, and, in keeping with good scientific practice, the analyses need to be reproducible. In addition, in exploring the data, analyses often need to be re-run with slightly different analysis parameters. Therefore, a vital component of this project was the development of a portable, fully automated, parallelised analysis “workflow”, such that individual steps, or the whole analysis, can be re-run and shared with other researchers. A large fraction of researcher-time was devoted to this development process. The final design of this workflow is highly modular, portable, scalable and easily parameterised so that it can be run on different data sets and different computational architectures. It is available for download from The development of this workflow also spawned a new collaboration with Dr. Kenjiro Taura of the Faculty of Engineering of the University of Tokyo, who is using it as a test bed for an architecture that allows analyses like the ones described here to be run on an even greater number of computer architectures, including the low-cost “Condor” architectures that are currently being adopted at many universities to replace the more costly “MPI” ones.