Periodic Reporting for period 4 - ApeGenomeDiversity (Great ape genome variation now and then: current diversity and genomic relics of extinct primates)
Reporting period: 2024-12-01 to 2025-11-30
A central achievement was the generation and analysis of a comprehensive global genome diversity panel for great apes. We expanded datasets to include more than 2,000 individuals across chimpanzees, gorillas, bonobos, and orangutans, enabling a unified framework for population structure, demographic history, and gene flow. Genome-wide analyses of gorillas and bonobos were completed and integrated with chimpanzee data, while ~700 orangutan genomes extended the framework to Asian lineages. These datasets represent the most extensive catalogue of great ape genomic diversity to date and provide a lasting resource for research and conservation.
The project also advanced a broader comparative understanding of primate evolution through analyses of more than 200 species, contributing to a coordinated Science Special Issue (2023). These efforts produced a global catalogue of primate genomic diversity and frameworks to characterize tolerated genetic variation. By leveraging deep evolutionary information, we identified functionally constrained genomic regions and established a foundation for AI-based prediction of deleterious mutations, linking evolutionary genomics and medical applications.
A second major achievement was the integration of historical museum specimens. We developed protocols for sequencing degraded DNA and generated genome-wide data from museum collections, incorporated into analyses of gorilla and bonobo populations. This enabled direct temporal comparisons between historical and contemporary populations, revealing recent demographic changes and shifts in genetic diversity. This represents the first catalogue capturing both present-day and historical genomic variation in great apes.
The project further extended genomic analyses into deep time through fossil specimens. By combining genomic and proteomic approaches, we analysed fossil primates, including orangutan-related lineages. This overcomes key technical barriers and marks a major advance in primate palaeogenomics, enabling robust taxonomic identification and evolutionary interpretation.
We also developed methodological innovations to analyse complex evolutionary processes. New frameworks were established to detect ancestral introgression and ghost admixture, revealing previously unrecognized gene flow among great ape lineages and improving understanding of reticulate evolution.
A key enabling technology was the large-scale application of chromosome capture optimized for non-invasive and low-quality samples. This enabled sequencing of more than 2,000 genomes from fecal, museum, and fossil material, expanding accessible biological material.
Importantly, the project translated these advances into tools with direct societal impact. A high-resolution geolocalization framework infers the geographic origin of individual great apes with high precision and underpins the Atlas of Sources of Illegal Trafficking, supporting forensics, law enforcement, and conservation across Africa.
A second achievement is a targeted SNP-amplicon sequencing technology for individual identification, enabling scalable population monitoring and genetic management, and advancing through an ERC Proof of Concept.
Finally, the project had a transformative impact on the research environment, enabling a multidisciplinary team and positioning the group as an internationally recognized leader in primate genomics.
In summary, the project delivered an integrative framework for studying primate genomic diversity across spatial and temporal scales, advancing both fundamental knowledge and conservation applications.
1. Generation of a global genome diversity panel from recent great ape species
We generated and analysed a large-scale genome-wide dataset for great apes, including >2,000 genomes across chimpanzees, gorillas, bonobos, and orangutans. This enabled unified analyses of population structure, demographic history, and gene flow. Gorilla and bonobo studies are under revision.
We also curated ~700 orangutan genomes, extending the dataset to Asian great apes, with publication expected in 2026. Analyses across >200 primate species contributed to a Science Special Issue (2023), establishing a global comparative framework for primate genomic diversity.
2. Generation of genome diversity from museum samples
We developed protocols for sequencing degraded museum DNA and generated genomic data from historical gorilla and bonobo specimens. These were integrated with modern samples, enabling temporal comparisons of population structure, diversity, and demographic change, and establishing a framework combining historical and contemporary genomes.
3. Genome sequences from fossil samples
We generated genomic and proteomic data from fossil primates, including orangutan-related specimens. Specialized protocols enabled recovery of degraded molecular data, allowing taxonomic assignment and phylogenetic inference, and extending analyses into deep time. Results are in preparation.
4. Methods for ancestral introgression
We developed methods to detect ancestral introgression and ghost admixture, applied across great ape datasets. These analyses revealed previously undetected gene flow and improved reconstruction of evolutionary history using integrated genomic approaches.
Overview of results, exploitation and dissemination
The project establishes a comprehensive primate genomics framework combining large-scale data, methodological innovation, and translational applications. The global dataset (>2,000 genomes) enables comparative analyses across great apes, complemented by contributions to a Science Special Issue (2023).
Integration of museum and fossil data allows temporal and deep-time evolutionary analyses. Methodological advances improve detection of introgression and sequencing of low-quality samples.
Exploitation includes a geolocalization framework to infer geographic origin of great apes, supporting the Atlas of Illegal Trafficking with partners (PASA, GRASP), and a targeted SNP-amplicon system for individual identification, underpinning an ERC Proof of Concept.
Dissemination includes high-impact publications and international collaborations.
This work departs from conventional approaches based on human-only datasets or simple sequence conservation metrics by integrating genome-wide variation from a large number of non-human primates. By leveraging deep evolutionary information spanning millions of years, the analyses capture complex selective constraints affecting both coding and regulatory regions. This enables the identification of functionally constrained genomic elements that are not detectable using standard methods, thereby substantially refining the interpretation of genetic variation.
A further advance lies in the integration of these evolutionary signals with machine learning frameworks for variant effect prediction. The primate-based comparative dataset provides a biologically informed basis for training models aimed at identifying deleterious mutations, particularly in cases where empirical or clinical data are limited. This approach improves the interpretation of rare and previously unclassified variants, addressing a key limitation in current human genomics.
These developments were enabled by the scale and diversity of the genomic datasets generated during the project and extend beyond the initially anticipated scope. Overall, the work establishes a new conceptual and methodological framework that connects comparative genomics with predictive modelling, with applications spanning evolutionary biology and human disease genetics.