Skip to main content

Systematic discovery of functional elements in RNA virus genomes: an Encyclopedia of RNA Virus Elements

Periodic Reporting for period 4 - ERVE (Systematic discovery of functional elements in RNA virus genomes: an Encyclopedia of RNA Virus Elements)

Reporting period: 2020-03-01 to 2021-08-31

A key step towards developing effective treatments and/or control strategies for viruses is to first achieve a solid understanding of their molecular biology. To do this, it is crucial to identify and characterize the full complement of genes and other functional elements encoded within their compact genomes. Many of the most important RNA viruses were amongst the first genomes to be sequenced, some 30–40 years ago. Surprisingly, however, "hidden" protein-coding genes are still being discovered even in some of the most well-studied RNA viruses. Such genes tend to be very short and/or embedded within other genes, making them very difficult to detect using conventional gene-finding approaches. Besides protein-coding genes, RNA virus genomes are also packed full of other functional elements – for example the RNA of the genome can fold into complex 3-dimensional structures that form important signals to direct the different aspects of the viral replicative cycle. Knowledge of these functional RNA elements is also far from complete.

Computational analysis of virus genomes provides a practical and cost-effective way forward that can be used to precisely and efficiently target follow-up experimental research. The genome sequences of RNA viruses evolve very rapidly so that there is considerable diversity between different isolates of a single virus species. For medically or economically important species, there are often dozens or even hundreds of sequences available. By comparing the sequences of different virus isolates and computationally analyzing the patterns of changes at different nucleotide positions (a technique known as "comparative genomics"), we can predict novel functional elements and often gain extensive insight into their function. Part of our work involves developing new comparative genomic techniques for virus genome analysis. We are also taking some of the most interesting newly discovered features into the lab to experimentally characterize exactly what their function is during virus infection. By enhancing our understanding of the molecular biology of many virus species, we lay essential groundwork for follow-up advances in diverse virus control strategies.

During the course of this project, we have developed new software and databases, we have computationally identified "hidden" genes and major functional non-coding elements in medically and economically important RNA virus genomes, we have experimentally characterized some of the most interesting computational findings, and we have characterized novel virus gene expression mechanisms.
The overarching goal of the project was to leverage the vast quantity of virus genome sequencing data available, using state-of-the-art computational techniques, to systematically identify functional elements in RNA virus genomes. The computational analysis was coupled with laboratory follow-up work to fully characterize the most important computational findings, ranging from novel features in important viruses, to novel molecular mechanisms with potential biotechnology applications. A major focus of the project was to investigate and understand the many unusual mechanisms that RNA viruses use to express their genes.

One such mechanism is "ribosomal frameshifting" whereby a proportion of translating ribosomes are stimulated to shift into an alternative reading frame to produce a "hybrid" protein. Frameshifting is normally stimulated by signals within the messenger RNA (mRNA) which induce a fixed expression ratio of frameshift and non-frameshift products, e.g. the Gag and Gag-Pol polyproteins of HIV. However, we discovered and characterized the only two known examples of protein-stimulated frameshifting (one in the cardioviruses and the other in the arteriviruses) where frameshifting depends on a viral protein binding to signals in the mRNA. This allows the virus to modulate the efficiency of frameshifting as the amount of virus protein changes over time in an infected cell, thus adding a new layer of regulation to virus gene expression.

One of the ways in which we are investigating virus gene expression is the relatively recently developed technique known as ribosome profiling. This technique relies on the fact that a translating ribosome covers around 30 nucleotides of mRNA. Nucleotide sequences that are not thus protected by translating ribosomes can be enzymatically digested, leaving millions of 30-nucleotide-long RNA fragments that can be analyzed using High-Throughput Sequencing technology and then mapped back to virus and host mRNAs, to give a global snapshot of the positions of translating ribosomes. We used ribosome profiling to study gene expression in many virus species including model coronaviruses.

A key aspect of our work is to identify and characterize "hidden genes" in RNA viruses. We identified and characterized a novel protein (termed UP) encoded in the genomes of most human-infecting enteroviruses including poliovirus type 1, and a novel protein (termed XP) encoded in the genomes of human and other mammalian astroviruses. We obtained follow-up funding to
investigate UP knockout as a widely applicable enterovirus vaccine strategy. Recently, through comparative genomic analysis of SARS-CoV-2 and related coronaviruses, we identified a new protein (termed 3c) encoded in their genomes and we are currently investigating the function of the 3c protein during virus infection.
We developed and applied sophisticated comparative genomic techniques to systematically discover previously overlooked functional elements in virus genomes. The resulting findings have and will continue to be an important resource for driving progress in understanding the molecular biology of viruses, particularly for less well-studied or newly emerging virus species – a case in point being our discovery of a new gene in the SARS-CoV-2 genome. We used targeted laboratory experiments to further investigate the most significant computational findings. This led to advances in our understanding of the biology of particular virus species of medical or economic concern – including enteroviruses, porcine reproductive and respiratory syndrome virus, human astroviruses, influenza A virus, SARS-CoV-2, and several plant viruses – thus laying the groundwork for advances in virus control strategies including the rational design of virus vaccine candidates and identification of potential targets for antiviral drugs. Due to the unique constraints under which RNA viruses evolve, they have developed a variety of novel gene expression strategies and other unique molecular mechanisms, many of which have applications in biotechnology and as tools for fundamental molecular biology research. One focus of our research has been to identify and characterize new such mechanisms. For example, one outcome of the project has been the discovery of “protein-stimulated ribosomal frameshifting” – a new mechanism for regulating gene expression which had been hypothesized more than 30 years ago but remained undiscovered until now. A significant sub-project has been to use ribosome profiling to investigate the dynamics of virus gene expression. We applied ribosome profiling to study – in unprecedented detail – the gene expression of many viruses including coronaviruses. We also developed new computational tools to refine and interpret the data produced from this technique, particularly in the context of infection studies.
Structure probing of the cardiovirus RNA frameshift signal
Phylogeny of mammalian astroviruses
Identification of a previously overlooked gene in enteroviruses via comparative genomic analysis
Analysis of the cardiovirus 2A protein binding to the RNA frameshift signal
Identifying sites of translation initiation in human astrovirus
Studying the suppression-of-RNA-silencing function of a newly discovered plant virus protein
High resolution analysis of coronavirus gene expression by ribosome profiling
Identification of a new gene in the SARS-CoV-2 genome via comparative genomic analysis
Cellular localization of the astrovirus XP protein
Growing virus in tissue culture
Growth analysis of enterovirus B with and without the UP protein in a "mini-gut" organoid system
84% decrease in ribosome density at the highly efficient cardiovirus ribosomal frameshifting site