Periodic Reporting for period 2 - PALVIREVOL (Paleovirology, the evolutionary dynamics of viral cross-species transmissions, and the consequences of virus-host gene exchange)
Période du rapport: 2023-04-01 au 2024-09-30
A key aim of the project PALVIREVOL is to obtain a deeper understanding of viral evolution over evolutionary timescales, that is, over periods of millions to hundreds of millions of years. To accomplish this goal, we take advantage of endogenous viral elements (EVEs) which are molecular fossils that can be found integrated into the genomes of modern animals. These EVEs have formed when viruses become incorporated into the DNA of a germline cell (reproductive cell) of a host organism. As the newly formed viral fossil can be passed down to the next generation, these viral fossils can potentially persist in the gene pools of host organisms for millions of years.
Given that viruses do not leave fossils in the geological record, EVEs are the only direct evidence of viruses in the deep past. Indeed, the dynamics of EVE integration and persistence mean that they can survive the timescale of speciation and persist in the genomes of descendant species that inherited the virus integration from a common ancestor. As such, we can use patterns of speciation and estimated divergence times of hosts (obtained from the rock fossil record or molecular clock analyses), to reconstruct the evolutionary history of a viral integration in deep time. Therefore, EVEs provide unique insights into viral diversity, virus-host interactions, and the mode and tempo of virus evolution over geological time.
We are particularly interested in leveraging the power and uniqueness of EVE data to answer questions about the gene flow between viruses and hosts, the evolutionary history of viruses in deep time, the dynamics of virus cross-species transmissions and the potential for EVEs of being co-opted to perform antiviral functions. We have organised these questions into two major themes: 1) The evolutionary dynamics of virus cross-species transmissions, and 2) The consequences of virus-host gene exchange.
Previous work in paleovirology and EVEs have shown a rich genomic fossil record for many virus families that can illuminate their evolutionary history in deep time. Given the growing knowledge of virus diversity and the increasing availability of host genome assemblies, it is likely that we have only scratched the surface of the virus fossil record preserved in host genomes. During this project, we will develop computational tools that will enable large-scale searches of EVEs in the growing data sets of host genome assemblies available today, therefore providing novel insights into the diversity of the virus fossil record and giving informative calibration points to inform analyses about the timescales of virus evolution. This will involve developing computational tools that can efficiently use the large amounts of publicly available genome data, and organise it into a useful knowledge base that can be accessed and queried to generate novel biological insights of virus-host interactions. In particular we are interested in using this knowledge gained by these analyses to study the role of virus-host gene exchange, both in the direction from virus to host (EVEs) and in the direction of hosts to viruses, which include eukaryotic genes acquired by viruses to manipulate the host immune response or cellular metabolism.
Apart from increasing our knowledge of virus diversity, host-associations and virus evolution, one of the central questions that we aim to answer is the potential antiviral function of EVEs that have been co-opted by hosts. Previous work has shown that ancient viral genes captured by host genomes can antagonise circulating viruses and protect the host from viral infection. We aim to use transcriptomic, genomic and evolutionary evidence, to shortlist candidate genes and test them for EDI function, hopefully paving the way for the rational discovery of antiviral EVEs found in host genomes.
To summarise, this project will illuminate our understanding of virus evolution spanning millions to hundreds of millions of years, and provide key insights into virus-host interactions and the basis of the molecular interface between virus infection and host immunity. The study of endogenous viral elements provides a unique opportunity and the only direct evidence to answer these types of questions of virus evolution in the deep past.
The work performed so far has resulted in the publication of 9 papers and one preprint (currently under review), which have contributed to our understanding of virus-host interactions, as well as expanding the known diversity of viral fossils found in the vertebrate genome fossil record. During the time leading to this reporting period we have produced the following publications:
1. Barreat and Katzourakis (2024) Nature Microbiology (doi: 10.1038/s41564-024-01825-4): Carried out one of the most comprehensive searches for EVEs in vertebrate genomes, which uncover endogenous representatives of 4 new viral families and members of the genera Orthonairovirus (related to Crimean-Congo hemorrhagic fever viruses) and Hepacivirus (Hepatitis C virus.) We also propose a macroevolutionary scenario for the origin of glycoprotein immunosuppressive ectodomains in amniote-infecting filoviruses and reptarenaviruses, which had an unknown origin.
2. Barreat and Katzourakis (2024). PLoS Computational Biology (doi: 10.1371/journal.pcbi.1010925): We developed mathematical models to arrive at a deeper understanding of the evolutionary and ecological dynamics of complex systems of cell-virus-virophage interactions. We demonstrate that the different infection mechanisms of virophages are probably driving the observed differences in their patterns of integration, it is possible for systems to stabilise by increasing the degree of virophage inhibition, and that virophage inhibition, programmed-cell death and multicellularity can act together as antiviral defence systems in microbial eukaryotes.
3. Barreat, Kamada, de Souza and Katzourakis (2023). Biology Letters (doi: 10.1098/rsbl.2022.0464): Discovered novel papillomaviruses that infect the Malayan and Chinese pangolins, both critically-endangered species of mammals. We were able to assemble full genomes and L1 sequences used in papillomavirus taxonomy, and show that these are highly prevalent (>50% individuals infected), in wild populations of pangolins.
4. Barreat and Katzourakis (2022). Journal of Virology (doi: 10.1128/jvi.00933-22): We describe some of the most ancient non-retroviral integrations found in the human genome and date them to an age ~102 million years. We show that these are remnants of ancient viruses that infected the most recent common ancestor of placental mammals, they endogenised, fixed and are present in a syntenic location across many types of placental mammals.
5. Ghafari et al (2023). Molecular Biology and Evolution (doi: 10.1093/molbev/msac009): We explored the factors that determine the variation in rates of evolution in SARS-CoV-2 and pH1N1 influenza. We showed that these rates vary in a time dependent way over the first 12 months of their respective pandemics, and that purifying selection is a determinant of time dependency of rates of evolution during pandemics.
6. Ghafari et al (2022). Nature Communications (doi: 10.1038/s41467-022-30711-y): We created a framework for reconstructing SARS-CoV-2 transmission dynamics from excess mortality data. We used this to contrast the infection dynamics in countries with limited data and explore the impacts of the pandemic.
7. Simmonds et al (2023). PLoS Biology (doi: 10.1371/journal.pbio.3001922). A universal taxonomy of viruses is essential for a comprehensive view of the virus world and for communications, and we developed an evolutionary framework along four key principles for establishing a universal virus taxonomy.
8. Ghafari et al (2022). Frontiers in Virology (doi: 10.3389/fviro.2022.942555). We investigated the origins of the first three SARS-CoV-2 variants of concern. Our findings were in best agreement with a model that these emerged within single individuals with long term infections.
9. Markov et al (2023). Nature Reviews Microbiology (doi: 10.1038/s41579-023-00878-2). We published a comprehensive review into the evolution of SARS-CoV-2. This provided an empirical example of cross-species transmission in real time, complementing the overarching goals of the project in terms of understanding the evolutionary dynamics of cross-species transmissions.
10. Ghafari et al (2023). PLoS Pathogens (doi: 10.1371/journal.ppat.1011911). We applied our newly developed method for inferring the long term evolutionary dynamics of viruses to the Sobemoviruses, a group of plant viruses. We inferred that these viruses emerged nearly 9,000 years ago, and our findings make a case for the possibility of deep evolutionary origins of plant viruses.
In parallel to these works, we have developed a computational pipeline that allows querying massive sets of viral proteins (up to hundreds of thousands of viral proteins) against the thousands of available host genomes, storing the tabular outputs in an open-source relational database system (postgreSQL), and which will enable the generation of the large data sets required to explore the questions on the occurrence and rates of virus cross-species transmissions. We have also developed a new algorithm to detect orthology across sets of thousands of EVE sequences, and which are essential to find informative calibration points to estimate the timescales at which virus evolution has unfolded.
Additionally to the major advancements in EVE computational mining, we integrated methods for EVE domain annotation and expression to select endogenous viral element derived-immunity genes (EDI) candidates and established experimental models to test their antiviral activity. Our search is also including non-model species, which will help expand the current knowledge on how often EVEs were co-opted to benefit vertebrate’s immunity, and how broad/specific EDI antiviral activity evolved in vertebrates.
In parallel to our work on EVEs, the unprecedented data generated by the SARS-CoV-2 pandemic created new opportunities to study viral evolution in real time, alongside the long term work that forms the core of this project. This allows additional insights into the evolution of cross species transmission in real time.
We have also used state-of-the-art methods for the development of the computational pipeline, including fast and sensitive sequence comparison algorithms (mmseqs2, diamond), parallelisation and containerisation of the pipeline using nextflow (together with anaconda and docker), and integrated a high-performance open-source database system (postgreSQL). This system will allow us to efficiently explore the large viral diversity found in host genomes at an unprecedented scale, an effort which is currently underway.
By the end of the project, we expect to have produced a comprehensive database of EVEs integrated into animal genomes, which will be made available in a relational database which can be queried to extract high-quality data for downstream analyses. This data will be used to infer rates of viral cross-species transmissions, study the genes exchanged by hosts and viruses (EVEs and host genes captured by viruses), and inform the selection of EVE candidates to test for EDI function. We hope that a number of the selected EDI candidates will be tested against a panel of viruses, and that we will be able to demonstrate a potential antiviral function for some of these genes. By uncovering novel host-virus interactions, our goal is also to establish experimental systems and discover novel EDIs, especially in non model organisms, with public health importance as reservoirs of pathogenic viruses. By the end of the project, we aim to have conducted the most comprehensive search of viral-derived sequences in animal genomes, created an EVE database accessible to the community, carried out an extensive and in-depth analysis on the evolutionary history/evidence from cross-species transmissions from this data, and hope to find new EVEs that are involved in antiviral immunity. The unfolding of the covid pandemic has also led to unprecedented amounts of viral genome sequence data, and the opportunity to compare the evolutionary dynamics of cross species transmissions across different timescales including in real time.