Skip to main content

Exploring and exploiting the potential of extinct genome sequencing

Periodic Reporting for period 3 - Extinction Genomics (Exploring and exploiting the potential of extinct genome sequencing)

Reporting period: 2019-04-01 to 2020-09-30

Palaeogenomics is the nascent discipline concerned with sequencing and analysis of genome-scale information from historic, ancient, and even extinct samples. While once inconceivable due to the challenges of DNA damage, contamination, and the technical limitations of PCR-based Sanger sequencing, following the dawn of the second-generation sequencing revolution, it has rapidly become a reality. Indeed, so much so, that popular perception has moved away from if extinct species’ genomes can be sequenced, to when it will happen - and even, when will the first extinct animals be regenerated. Unfortunately this view is naïve, and does not account for the financial and technical challenges that face such attempts. In this project we are exploring exactly what the limits on genome reconstruction from extinct or otherwise historic/ancient material are. This is being achieved through the development of new laboratory and bioinformatic tools aimed at decreasing the cost, while concomitantly increasing the quality of genome reconstruction from poor quality materials. In doing so our team is building a scientifically-grounded framework against which the possibilities and limitations of extinct genome reconstruction can be assessed. We are also generating genomic data from a range of extinct and near-extinct avian and mammalian species, in order to showcase the potential of reconstructed genomes across research questions spanning at least three different streams of research: De-extinction, Evolutionary Genomics, and Conservation Genomics.
We have made headway on our 5 main Work Packages (WPs), which largely set the stage upon which our subsequent analyses will build. The progress so far includes the following:

WP1: Sample collection. All core samples have been collected and thanks to the above mentioned interest, we have significantly expanded on the systems to be studied. Currently we are therefore generating genomic data from the following core species: Black, White, Javan, Sumatran, Woolly and Indian rhinoceroses, crested ibis, great auk, lions, big cats and sabretoothed cats, Chistmas Island rat, wolves and related canids, giant elk and aurochs, as well as expanding to analyse population level datasets of the critically threatened Santa Catarina guinea pigs, saola and Seychelles magpie robins.

WP2: Methodology. We have made significant progress with regards to DNA extraction and sequencing methods, leading to the publication of three papers so far detailing new library construction (Carøe et al. Methods Ecol Evol 2017) and sequencing (Mak et al. GigaScience 2017) and methods. These in turn have lead to several spinoff papers in collaboration with external groups who have tried our methods (Grealy et al. MPE2017, Gelabert et al. PNAS 2017). Considerable progress has been made on genome reconstruction, through forging a new collaboration with colleagues in the USA and Spain, and we are optimistic we will succeed in this goal. T

WP3: Evolutionary Genomic analyses. This WP has begun on some of the ancient genomes generated (Sabrecats, great auk, Christmas Island rats, wolves/canids). We have published three papers based on the underlying theory of the methods (Richmond et al Open Quaternary 2016, Sinding et al. Open Quaternary 2017, Diez-Del-Molino et al. TREE 2018).

WP4: Functional assays. Having sequenced the great auk genome, we are preparing the first functional assays with our collaborators in the UK and Canada.

WP5: Population Genomic analyses. These have commenced on almost all of our datasets, and have resulted in several papers on dog, wolf and crested ibis genomes (Gopalakrishnan et al, BMC Genomics 2017, Liu et al. Mol Biol Evol 2017, Sinding et al. PLoS Genetics 2018, Feng et al. Current Biology 2018).
At this timepoint, the principal progress that falls beyond the state of the art has related to the development of new laboratory methods with which to manipulate the DNA within ancient samples. This falls into two main categories (a) improved methods for the construction of sequencing libraries on poor quality DNA, and (b) more economical methods for sequencing the DNA. For the former, we have made an efficient library build method, that by keeping all reagents in a single tube and the removal of purification steps using spin columns (or the like) enables libraries to be built with very little loss of template DNA. This simplifies library build while simultaneously allowing more complex data to be generated. This has been published in Carøe et al. 2017 (Methods in Ecology and Evolution). The second achievement was development of the laboratory methods with which to allow sequencing of ancient DNA on the newly released BGISeq platform. In this, we designed a library preparation method based upon our above mentioned technique, and validated it for the BGISeq in comparison to Illumina sequencing of historic and ancient samples. The data was shown to be equal in quality, but more economic, showcasing the potential of this platform for palaeogenomics. This was published in Mak et al. 2017 (Gigascience).

In the upcoming period we anticipate to apply these tools to other substrates, as well as add in more laboratory and computational methods for improving the data recovered from, and interpretations based on, ancient samples. Furthermore we anticipate that as our genomic datasets are completed, we will be able to undertake the planned evolutionary, population and conservation genomic analyses that form the core of this project.