Skip to main content
Aller à la page d’accueil de la Commission européenne (s’ouvre dans une nouvelle fenêtre)
français français
CORDIS - Résultats de la recherche de l’UE
CORDIS

Paleovirology, the evolutionary dynamics of viral cross-species transmissions, and the consequences of virus-host gene exchange

Periodic Reporting for period 2 - PALVIREVOL (Paleovirology, the evolutionary dynamics of viral cross-species transmissions, and the consequences of virus-host gene exchange)

Période du rapport: 2023-04-01 au 2024-09-30

The PALVIREVOL project aims to arrive at a deeper understanding of the evolutionary history of viruses and their interactions with eukaryotes, in particular by looking at the patterns of virus-host horizontal gene transfer (HGT), cross-species transmissions, tempo and mode of evolution, and exaptation of endogenous viral elements (EVEs) for use in antiviral immunity. We aim to achieve this by 1) conducting a comprehensive search for viral-derived sequences in eukaryotic host genomes, 2) building a database of EVEs and EVE annotations, 3) developing models that can account for different evolutionary rates in exogenous viruses/EVEs and for their variation over time, 4) performing evolutionary and transcriptomic analyses of the EVEs to select candidates for experimental work, and 5) challenging EVE candidates with several viral families to uncover new EVE-derived immunity (EDI) genes. In terms of the impact to society, this work will increase our understanding of the diversity and deep evolutionary history of viruses, the factors governing viral cross-species transmissions, point at potential unknown host reservoirs of human/animal pathogenic viruses, and lead to the discovery of new antiviral mechanisms might offer insights in novel therapeutic solutions.

A key aim of the project PALVIREVOL is to obtain a deeper understanding of viral evolution over evolutionary timescales, that is, over periods of millions to hundreds of millions of years. To accomplish this goal, we take advantage of endogenous viral elements (EVEs) which are molecular fossils that can be found integrated into the genomes of modern animals. These EVEs have formed when viruses become incorporated into the DNA of a germline cell (reproductive cell) of a host organism. As the newly formed viral fossil can be passed down to the next generation, these viral fossils can potentially persist in the gene pools of host organisms for millions of years.

Given that viruses do not leave fossils in the geological record, EVEs are the only direct evidence of viruses in the deep past. Indeed, the dynamics of EVE integration and persistence mean that they can survive the timescale of speciation and persist in the genomes of descendant species that inherited the virus integration from a common ancestor. As such, we can use patterns of speciation and estimated divergence times of hosts (obtained from the rock fossil record or molecular clock analyses), to reconstruct the evolutionary history of a viral integration in deep time. Therefore, EVEs provide unique insights into viral diversity, virus-host interactions, and the mode and tempo of virus evolution over geological time.

We are particularly interested in leveraging the power and uniqueness of EVE data to answer questions about the gene flow between viruses and hosts, the evolutionary history of viruses in deep time, the dynamics of virus cross-species transmissions and the potential for EVEs of being co-opted to perform antiviral functions. We have organised these questions into two major themes: 1) The evolutionary dynamics of virus cross-species transmissions, and 2) The consequences of virus-host gene exchange.

Previous work in paleovirology and EVEs have shown a rich genomic fossil record for many virus families that can illuminate their evolutionary history in deep time. Given the growing knowledge of virus diversity and the increasing availability of host genome assemblies, it is likely that we have only scratched the surface of the virus fossil record preserved in host genomes. During this project, we will develop computational tools that will enable large-scale searches of EVEs in the growing data sets of host genome assemblies available today, therefore providing novel insights into the diversity of the virus fossil record and giving informative calibration points to inform analyses about the timescales of virus evolution. This will involve developing computational tools that can efficiently use the large amounts of publicly available genome data, and organise it into a useful knowledge base that can be accessed and queried to generate novel biological insights of virus-host interactions. In particular we are interested in using this knowledge gained by these analyses to study the role of virus-host gene exchange, both in the direction from virus to host (EVEs) and in the direction of hosts to viruses, which include eukaryotic genes acquired by viruses to manipulate the host immune response or cellular metabolism.

Apart from increasing our knowledge of virus diversity, host-associations and virus evolution, one of the central questions that we aim to answer is the potential antiviral function of EVEs that have been co-opted by hosts. Previous work has shown that ancient viral genes captured by host genomes can antagonise circulating viruses and protect the host from viral infection. We aim to use transcriptomic, genomic and evolutionary evidence, to shortlist candidate genes and test them for EDI function, hopefully paving the way for the rational discovery of antiviral EVEs found in host genomes.

To summarise, this project will illuminate our understanding of virus evolution spanning millions to hundreds of millions of years, and provide key insights into virus-host interactions and the basis of the molecular interface between virus infection and host immunity. The study of endogenous viral elements provides a unique opportunity and the only direct evidence to answer these types of questions of virus evolution in the deep past.
We have carried out comprehensive searches for novel EVEs in vertebrate genomes, developed computational tools to enable such large-scale searches and inference of orthology, developed mathematical models to understand virus-host interactions in complex microbial systems, and worked on building the experimental capacities and workflows to test for endogenous viral element-derived immunity (EDI) function in candidate retroviral genes.

The work performed so far has resulted in the publication of 9 papers and one preprint (currently under review), which have contributed to our understanding of virus-host interactions, as well as expanding the known diversity of viral fossils found in the vertebrate genome fossil record. During the time leading to this reporting period we have produced the following publications:

1. Barreat and Katzourakis (2024) Nature Microbiology (doi: 10.1038/s41564-024-01825-4): Carried out one of the most comprehensive searches for EVEs in vertebrate genomes, which uncover endogenous representatives of 4 new viral families and members of the genera Orthonairovirus (related to Crimean-Congo hemorrhagic fever viruses) and Hepacivirus (Hepatitis C virus.) We also propose a macroevolutionary scenario for the origin of glycoprotein immunosuppressive ectodomains in amniote-infecting filoviruses and reptarenaviruses, which had an unknown origin.

2. Barreat and Katzourakis (2024). PLoS Computational Biology (doi: 10.1371/journal.pcbi.1010925): We developed mathematical models to arrive at a deeper understanding of the evolutionary and ecological dynamics of complex systems of cell-virus-virophage interactions. We demonstrate that the different infection mechanisms of virophages are probably driving the observed differences in their patterns of integration, it is possible for systems to stabilise by increasing the degree of virophage inhibition, and that virophage inhibition, programmed-cell death and multicellularity can act together as antiviral defence systems in microbial eukaryotes.

3. Barreat, Kamada, de Souza and Katzourakis (2023). Biology Letters (doi: 10.1098/rsbl.2022.0464): Discovered novel papillomaviruses that infect the Malayan and Chinese pangolins, both critically-endangered species of mammals. We were able to assemble full genomes and L1 sequences used in papillomavirus taxonomy, and show that these are highly prevalent (>50% individuals infected), in wild populations of pangolins.

4. Barreat and Katzourakis (2022). Journal of Virology (doi: 10.1128/jvi.00933-22): We describe some of the most ancient non-retroviral integrations found in the human genome and date them to an age ~102 million years. We show that these are remnants of ancient viruses that infected the most recent common ancestor of placental mammals, they endogenised, fixed and are present in a syntenic location across many types of placental mammals.

5. Ghafari et al (2023). Molecular Biology and Evolution (doi: 10.1093/molbev/msac009): We explored the factors that determine the variation in rates of evolution in SARS-CoV-2 and pH1N1 influenza. We showed that these rates vary in a time dependent way over the first 12 months of their respective pandemics, and that purifying selection is a determinant of time dependency of rates of evolution during pandemics.

6. Ghafari et al (2022). Nature Communications (doi: 10.1038/s41467-022-30711-y): We created a framework for reconstructing SARS-CoV-2 transmission dynamics from excess mortality data. We used this to contrast the infection dynamics in countries with limited data and explore the impacts of the pandemic.

7. Simmonds et al (2023). PLoS Biology (doi: 10.1371/journal.pbio.3001922). A universal taxonomy of viruses is essential for a comprehensive view of the virus world and for communications, and we developed an evolutionary framework along four key principles for establishing a universal virus taxonomy.

8. Ghafari et al (2022). Frontiers in Virology (doi: 10.3389/fviro.2022.942555). We investigated the origins of the first three SARS-CoV-2 variants of concern. Our findings were in best agreement with a model that these emerged within single individuals with long term infections.

9. Markov et al (2023). Nature Reviews Microbiology (doi: 10.1038/s41579-023-00878-2). We published a comprehensive review into the evolution of SARS-CoV-2. This provided an empirical example of cross-species transmission in real time, complementing the overarching goals of the project in terms of understanding the evolutionary dynamics of cross-species transmissions.

10. Ghafari et al (2023). PLoS Pathogens (doi: 10.1371/journal.ppat.1011911). We applied our newly developed method for inferring the long term evolutionary dynamics of viruses to the Sobemoviruses, a group of plant viruses. We inferred that these viruses emerged nearly 9,000 years ago, and our findings make a case for the possibility of deep evolutionary origins of plant viruses.

In parallel to these works, we have developed a computational pipeline that allows querying massive sets of viral proteins (up to hundreds of thousands of viral proteins) against the thousands of available host genomes, storing the tabular outputs in an open-source relational database system (postgreSQL), and which will enable the generation of the large data sets required to explore the questions on the occurrence and rates of virus cross-species transmissions. We have also developed a new algorithm to detect orthology across sets of thousands of EVE sequences, and which are essential to find informative calibration points to estimate the timescales at which virus evolution has unfolded.

Additionally to the major advancements in EVE computational mining, we integrated methods for EVE domain annotation and expression to select endogenous viral element derived-immunity genes (EDI) candidates and established experimental models to test their antiviral activity. Our search is also including non-model species, which will help expand the current knowledge on how often EVEs were co-opted to benefit vertebrate’s immunity, and how broad/specific EDI antiviral activity evolved in vertebrates.

In parallel to our work on EVEs, the unprecedented data generated by the SARS-CoV-2 pandemic created new opportunities to study viral evolution in real time, alongside the long term work that forms the core of this project. This allows additional insights into the evolution of cross species transmission in real time.
We have pioneered the use of cloud-computing for efficiently searching massive genome databases for the discovery of novel viruses. We have shown how using state-of-the-art computational infrastructure hosted on the cloud to query large public genome databases, can lead to novel insights in terms of the evolutionary history and diversity of viruses. By using this strategy, we have discovered multiple novel EVEs in host genomes that shed light onto the ecology and evolution of diverse viral families and their host associations (Barreat and Katzourakis, 2024), and we have shown it is possible to find novel species of exogenous viruses “hiding” in the vast amounts of data produced by host genome assemblies (Barreat, Kamada, de Souza and Katzourakis, 2023). Conducting these analyses in more traditional ways would be very time consuming, and potentially unfeasible given their size, therefore limiting the number of discoveries that can be made.

We have also used state-of-the-art methods for the development of the computational pipeline, including fast and sensitive sequence comparison algorithms (mmseqs2, diamond), parallelisation and containerisation of the pipeline using nextflow (together with anaconda and docker), and integrated a high-performance open-source database system (postgreSQL). This system will allow us to efficiently explore the large viral diversity found in host genomes at an unprecedented scale, an effort which is currently underway.

By the end of the project, we expect to have produced a comprehensive database of EVEs integrated into animal genomes, which will be made available in a relational database which can be queried to extract high-quality data for downstream analyses. This data will be used to infer rates of viral cross-species transmissions, study the genes exchanged by hosts and viruses (EVEs and host genes captured by viruses), and inform the selection of EVE candidates to test for EDI function. We hope that a number of the selected EDI candidates will be tested against a panel of viruses, and that we will be able to demonstrate a potential antiviral function for some of these genes. By uncovering novel host-virus interactions, our goal is also to establish experimental systems and discover novel EDIs, especially in non model organisms, with public health importance as reservoirs of pathogenic viruses. By the end of the project, we aim to have conducted the most comprehensive search of viral-derived sequences in animal genomes, created an EVE database accessible to the community, carried out an extensive and in-depth analysis on the evolutionary history/evidence from cross-species transmissions from this data, and hope to find new EVEs that are involved in antiviral immunity. The unfolding of the covid pandemic has also led to unprecedented amounts of viral genome sequence data, and the opportunity to compare the evolutionary dynamics of cross species transmissions across different timescales including in real time.
Paleovirology Image