Periodic Reporting for period 4 - ERVE (Systematic discovery of functional elements in RNA virus genomes: an Encyclopedia of RNA Virus Elements)
Berichtszeitraum: 2020-03-01 bis 2021-08-31
Computational analysis of virus genomes provides a practical and cost-effective way forward that can be used to precisely and efficiently target follow-up experimental research. The genome sequences of RNA viruses evolve very rapidly so that there is considerable diversity between different isolates of a single virus species. For medically or economically important species, there are often dozens or even hundreds of sequences available. By comparing the sequences of different virus isolates and computationally analyzing the patterns of changes at different nucleotide positions (a technique known as "comparative genomics"), we can predict novel functional elements and often gain extensive insight into their function. Part of our work involves developing new comparative genomic techniques for virus genome analysis. We are also taking some of the most interesting newly discovered features into the lab to experimentally characterize exactly what their function is during virus infection. By enhancing our understanding of the molecular biology of many virus species, we lay essential groundwork for follow-up advances in diverse virus control strategies.
During the course of this project, we have developed new software and databases, we have computationally identified "hidden" genes and major functional non-coding elements in medically and economically important RNA virus genomes, we have experimentally characterized some of the most interesting computational findings, and we have characterized novel virus gene expression mechanisms.
One such mechanism is "ribosomal frameshifting" whereby a proportion of translating ribosomes are stimulated to shift into an alternative reading frame to produce a "hybrid" protein. Frameshifting is normally stimulated by signals within the messenger RNA (mRNA) which induce a fixed expression ratio of frameshift and non-frameshift products, e.g. the Gag and Gag-Pol polyproteins of HIV. However, we discovered and characterized the only two known examples of protein-stimulated frameshifting (one in the cardioviruses and the other in the arteriviruses) where frameshifting depends on a viral protein binding to signals in the mRNA. This allows the virus to modulate the efficiency of frameshifting as the amount of virus protein changes over time in an infected cell, thus adding a new layer of regulation to virus gene expression.
One of the ways in which we are investigating virus gene expression is the relatively recently developed technique known as ribosome profiling. This technique relies on the fact that a translating ribosome covers around 30 nucleotides of mRNA. Nucleotide sequences that are not thus protected by translating ribosomes can be enzymatically digested, leaving millions of 30-nucleotide-long RNA fragments that can be analyzed using High-Throughput Sequencing technology and then mapped back to virus and host mRNAs, to give a global snapshot of the positions of translating ribosomes. We used ribosome profiling to study gene expression in many virus species including model coronaviruses.
A key aspect of our work is to identify and characterize "hidden genes" in RNA viruses. We identified and characterized a novel protein (termed UP) encoded in the genomes of most human-infecting enteroviruses including poliovirus type 1, and a novel protein (termed XP) encoded in the genomes of human and other mammalian astroviruses. We obtained follow-up funding to
investigate UP knockout as a widely applicable enterovirus vaccine strategy. Recently, through comparative genomic analysis of SARS-CoV-2 and related coronaviruses, we identified a new protein (termed 3c) encoded in their genomes and we are currently investigating the function of the 3c protein during virus infection.