Periodic Reporting for period 1 - METHYVIREVOL (Virocellular hybrids and epigenomic changes as driving factors of infection driven cancers.)
Reporting period: 2019-03-14 to 2021-03-13
The genetic code is defined by the correspondence between codons (structural units of a gene) and amino acids (elementary building blocks of proteins). It is based on three main foundations: it is universal in all living beings (with a few exceptions), it is univocal, i.e. each codon specifies a single amino acid, and it is degenerated: 18 of the 20 amino acids essential to life are encoded by several codons, called synonymous codons. As an example, the amino acid valine is encoded by 4 synonymous codons: GTA, GTC, GTG and GTT.
Synonymous codons do not appear with equal frequency in the coding sequences of living organisms. Understanding the origin of these unequal frequencies is a classical, unsolved question embracing evolutionary and molecular biology. In unicellular organisms, such as the bacterium Escherichia coli or the yeast Saccharomyces cerevisiae, the most commonly observed codons in highly expressed genes correspond to the most abundant tRNAs in the cell, strongly suggesting that codon usage and tRNA content have coevolved in a manner that optimizes translation. Similarly, evidence of natural selection acting through synonymous codon usage has been reported in many organisms, such as flies, nematodes and the branchiopod Daphnia pulex. Besides translational selection, neutral mutational forces can also influence synonymous codon usage. In several vertebrates, the primary driver of the non-random usage of synonymous codons is a molecular mechanism driving the genomic GC content evolution. The overarching question in the field is to determine in a given organism which fraction of the synonymous mutations is impacted by adaptive or non-adaptive processes.
Viruses provide an original model in the field, as all viruses depend on the host translation machinery, especially viruses that do not code their own tRNAs (as is the case for human viruses). Therefore, given the dependence of viruses on the translation machinery of their host, is there a selection pressure on the use of viral synonymous codons? Here I conducted a large-scale investigation of the genetic code variation of virus, including several coronavirus (SARS-CoV2, MERS, SARS-CoV1, etc…). Contrary to our initial hypothesis, selection pressure was not found as the main mechanism driving the genetic code variation. Instead, the mechanism that best explains such variation is non-adaptive evolutionary processes, such as mutational bias. For example, I show in coronaviruses that a strong mutational bias from C to T and G to A is observed. Together, my results compile the current knowledge on the genetic code variation of virus and how adaptive and non-adaptive evolutionary processes drive such variation.
• Why is it important for society?
Vaccine engineering is one of the most efficient ways of fighting against diseases caused by viruses. One technology used to generate vaccine is to weakened or modified the whole or a part of the virus genome, so that they do not case illness, but still the immune system creates cells that can ‘recognise’, and protect against, the disease-causing forms of the virus if these are encountered later. Such form of vaccine is called live attenuated vaccines. The recoded viruses are antigenically identical to their pathogenic parents. The antigenic identity and replicative potential enable attenuated viruses to induce immune responses that are similar to those of virulent strains. By identifying which synonymous codon will weakened the sequence of a virus, my work is located upstream of the creation of live attenuated vaccines.
• What are the overall objectives?
The ultimate aim of this research is to understand the forces that shape codon usage bias in viruses infecting humans, and to evaluate the importance of the match between the codon usage bias of a virus and that of humans to predict its potential zoonotic risk. To answer such question, I aimed to characterize whether the codon usage bias of DNA human viruses is adapted to the human translation protein synthesize mechanism. Then, I narrowed down my analysis to the coronavirus family and further to the SARS-CoV-2 genome to investigate whether codon usage preferences impact the initial zoonotic spillover from animals towards humans and to eventually govern the risks of stable human-to-human transmission. Our results suggest that viral codon usage preferences are largely shaped by neutral mutation forces, with some directional contribution avoiding specific nucleotide patterns, and that its contribution at shaping the host breadth range of a virus is minor.