Periodic Reporting for period 1 - GRASSHOPPER (understandinG the Role of codon usAge biaS in viruS HOst jumP ProcessEs thRough statistical physics)
Reporting period: 2021-10-01 to 2023-09-30
For instance, some motifs are more likely to be recognized by the host's immune system, and are therefore generically avoided by viruses.
Moreover, the codon usage of viruses is determined by the availability of tRNAs in the host cell, which in turn is influenced by the host's own codon usage.
This interplay between host and virus codon usage and the avoidance of motifs recognized by the host's immune system are a key factors in the adaptation of viruses to their hosts.
By studying the patterns of nucleotide and codon usage in viral genomes, we can gain insights into the evolutionary processes that shape these genomes, and use this knowledge to develop new tools for tracking emerging pathogens, developing vaccines, and designing antiviral drugs.
The aim of this project was to use methods from statistical physics to build new tools to investigate the role of nucleotide and codon usage in the adaptation of human-infecting viruses to their hosts. The project was divided into two main objectives: (1) to develop a method to identify the pressures that each host exerts on the nucleotide and codon usage of its viruses, and (2) to investigate the role of nucleotide and codon usage in host adaptation of human-infecting viruses following host jumps.
In addition to the main scientific objectives, the project was also aimed at developing the skills of the researcher, in particular by providing training in the use of statistical physics methods to study biological systems, and in the use of machine learning techniques to analyze genomic data.
Both the scientific and the training objectives of the project were successfully achieved, and the fellowship has been instrumental in the researcher's career development: immediately after the end of the project, the fellow joined a startup company as a Senior Data Scientist to perform research on machine-learning-guided development of phage cocktails for the treatment of bacterial infections.
We demonstrated how models trained to recognize pressures exerted by different hosts can be used to identify the host a virus evolve in contact with, even from a small part of its genome.
We introduced a computational framework based on the model to study the evolution of pathogens after an host jump, using as examples the Influenza pandemic of 1918 (for which few genomic sequences are available but over 100 years of epidemiological data) and the SARS-CoV-2 pandemic (for which many genomic sequences are available but only over a few months of epidemiological data).
We also suggested how to interpret the results of the model in terms biological mechanisms, and showed how to use the framework to have a fine-grain control over the nucleotide usage of de novo designed RNA sequences.
These results have been made available as a scientific paper on a preprint server, and shared widely with the scientific community thanks to talks at international conferences and seminars.
The code used to implement the model is available on GitHub, and extensively documented to facilitate its use by other researchers.
The results obtained on the two study cases considered within the project broaden our understanding of the evolutionary processes that shape viral genomes, provide a new method to track emerging pathogens, and open new avenues for the design of antiviral drugs and vaccines.
The project's findings have applications that include metagenomic studies, pathogen surveillance, immunogenicity prediction during viral evolution, and assistance in RNA vaccine design.
Moreover, the results obtained so far are the first fundamental steps towards a number of potential follow-up projects involving statistical physics modelling and machine learning to study the evolution of viral genomes and their interactions with the host at the genomic level.