Skip to main content

Exploring and exploiting the potential of extinct genome sequencing

Periodic Reporting for period 4 - Extinction Genomics (Exploring and exploiting the potential of extinct genome sequencing)

Reporting period: 2020-10-01 to 2021-03-31

Palaeogenomics is the nascent discipline concerned with sequencing and analysis of genome-scale information from historic, ancient, and even extinct samples. While once inconceivable due to the challenges of DNA damage, contamination, and the technical limitations of PCR-based Sanger sequencing, following the dawn of the second-generation sequencing revolution, it has rapidly become a reality. Indeed, so much so, that popular perception has moved away from if extinct species’ genomes can be sequenced, to when it will happen - and even, when will the first extinct animals be regenerated. Unfortunately this view is naïve, and does not account for the financial and technical challenges that face such attempts. In this project we explored exactly what the limits on genome reconstruction from extinct or otherwise historic/ancient material are. This was achieved through the development of new laboratory and bioinformatic tools aimed at decreasing the cost, while concomitantly increasing the quality of genome reconstruction from poor quality materials. In doing so our team built a scientifically-grounded framework against which the possibilities and limitations of extinct genome reconstruction can now be assessed. We also generated and published genomic data from a range of extinct, near-extinct and control avian and mammalian species, in order to showcase the potential of reconstructed genomes across research questions spanning at least three different streams of research: De-extinction, Evolutionary Genomics, and Conservation Genomics. We conclude that thanks to a combination of technical developments that today allow the sequencing and reconstruction of genomes at chromosome completeness quality, the ability to generate high quality sequences from extinct species, and the availability of new computational tools, that it is possible to reconstruct computationally large fragments of the genomes of extinct species. However due to the challenge of evolutionary divergence of extinct species from extant species, ultimately we will only be able to truly reconstruct species for which intact and biologically viable cells can be recovered, thus limiting de-extinction to only a handful of exceptional species.
We largely completed the aims of our 5 main Work Packages (WPs), which set the stage for considerable future research by both our team and others. Specifically:

WP1: Sample collection. All core samples were collected and, we were able to significantly expand on the systems to be studied. We generated genomic data from the following core species: Black, White, Javan, Sumatran, and Indian rhinoceroses, as well as the now extinct Woolly rhinoceros, Merck's rhinoceros and Elasmotherium, the crested ibis, great auk, lions, big cats and the sabretoothed cats Smilodon and Homotherium, the Christmas Island rat, multiple lineages of wolves and related canids, aurochs, the critically threatened Santa Catarina guinea pigs, saola, Seychelles magpie robins and koalas.

WP2: Methodology. We made significant progress with regards to DNA extraction and sequencing methods, leading to the publication of three papers detailing new library construction and sequencing and one paper detailing a new computational method. These in turn have lead to several spinoff papers in collaboration with external groups who have tried our methods. Considerable progress was made on genome reconstruction, through forging a new collaboration with colleagues in the USA and Spain.

WP3: Evolutionary Genomic analyses. This WP involved the study of many of the ancient genomes generated (Sabrecats, great auk, Christmas Island rats, wolves/canids, Seychelles Magpie Robins, all rhino species). We also published three papers based on the underlying theory of the methods.

WP4: Functional assays. Having sequenced the great auk genome we initiated experiments on the data generated with our collaborators, and are considering the implications of the research.

WP5: Population Genomic analyses. These have commenced on almost all of our datasets, and have resulted in many papers and in review articles on our species.

Overall we have disseminated our research so far in 39 published peer review publications, as well as through numerous national and international conferences, guest lectures and visiting speaker invitations.
Much progress that falls beyond the state of the art has been made. This includes the development of new laboratory methods with which to manipulate the DNA within ancient samples, which falls into two main categories (a) improved methods for the construction of sequencing libraries on poor quality DNA, and (b) more economical methods for sequencing the DNA. From the computational point of view, we developed a framework for demonstrating the use of in silico reconstructed genomes for improving the recovery of extinct species genomes (Vieira et al. Ecology and Evolution 2020), which in light of the ever expanding reference datasets (including those we contributed to, e.g. Gopalakrishnan et al. Genome Biology 2017, Gopalakrishan et al. Current Biology 2018, Feng et al. Nature 2020, Rhie et al. Nature 2021) will be a powerful tool for future studies.

From the context of understanding how extinction relates to genomes, we have also made immense progress. Areas covered include not only refining our understanding of the relationship of species and populations to each other (eg de Manuel et al. PNAS 2020, Liu et al in review), but importantly showing how as populations enter bottlenecks, exactly how their genomes are shaped. This in particular has been demonstrated clearly using temporally spanning datasets of lions (de Manuel et al. PNAS 2020), rhinos (Sanchez et al. in review), koalas (Sandoval Velasco et al. in prep), crested ibis (Feng et al. Current Biology 2018) and wolves (Ramos-Madrigal et al. Current Biology 2020). Additionally by sequencing the genomes of extant vs extinct species (e.g. using big cats [Barnett et al. Current Biology 2020, Westbury et al. 2021], rhinos [Liu et al., in review, Sanchez et a in review], great auk [Margaryan et al. in prep] as models) we were able to explore how features such as genetic diversity, runs of homozygosity, levels of inbreeding etc) change as species become increasingly threatened. In combination such information will help guide future conservation.
Evolutionary insights from the Homotherium genome. Image courtesy of Dr Binia de Cahsan