Skip to main content

Systematic discovery of functional elements in RNA virus genomes: an Encyclopedia of RNA Virus Elements

Periodic Reporting for period 3 - ERVE (Systematic discovery of functional elements in RNA virus genomes: an Encyclopedia of RNA Virus Elements)

Reporting period: 2018-09-01 to 2020-02-29

"A key step towards developing effective treatments and/or control strategies for viruses is to first achieve a solid understanding of their molecular biology. To do this, it is crucial to identify and characterize the full complement of genes and other functional elements encoded within their compact genomes. Many of the most important RNA viruses were amongst the first genomes to be sequenced, some 30–40 years ago. Surprisingly, however, ""hidden"" protein-coding genes are still being discovered even in some of the most well-studied RNA viruses. Such genes tend to be very short and/or embedded within other genes, making them very difficult to detect using conventional gene-finding approaches. Besides protein-coding genes, RNA virus genomes are also packed full of other functional elements – for example the RNA of the genome can fold into complex 3-dimensional structures that form important signals to direct the different aspects of the viral replicative cycle. Knowledge of these functional RNA elements is also far from complete.

Computational analysis of virus genomes provides a practical and cost-effective way forward that can be used to precisely and efficiently target follow-up experimental research. The genome sequences of RNA viruses evolve very rapidly so that there is considerable diversity between different isolates of a single virus species. For medically or economically important species, there are often dozens or even hundreds of sequences available. By comparing the sequences of different virus isolates and computationally analysing the patterns of changes at different nucleotide positions (a technique known as ""comparative genomics""), we can predict novel functional elements and often gain extensive insight into their function. Part of our work involves developing new comparative genomic techniques for virus genome analysis. We are also taking some of the most interesting newly discovered features into the lab to experimentally characterize exactly what their function is during virus infection. By enhancing our understanding of the molecular biology of many virus species, the project lays essential ground work for follow-up advances in diverse virus control strategies."
"The overarching goal of the project is to leverage the vast quantity of virus genome sequencing data available, using state-of-the-art computational techniques, to systematically identify functional elements in RNA virus genomes. The computational analysis is coupled with laboratory follow-up work to fully characterize the most important computational findings, ranging from novel features in important viruses, to novel molecular mechanisms with potential biotechnology applications. A major focus of the project is understanding the many unusual mechanisms that RNA viruses use to express their genes.

One such mechanism is ""ribosomal frameshifting"" whereby a proportion of translating ribosomes are stimulated to shift into an alternative reading frame to produce a ""hybrid"" protein. Frameshifting is normally stimulated by signals within the mRNA which induce a fixed expression ratio of frameshift and non-frameshift products, e.g. the Gag and Gag-Pol polyproteins of HIV. However, we discovered and characterized the only two known examples of protein-stimulated frameshifting (one in the cardioviruses and the other in the arteriviruses) where frameshifting depends on a viral protein binding to signals in the mRNA. This allows the virus to modulate the efficiency of frameshifting as the amount of virus protein changes over time in an infected cell and thus regulate virus gene expression.

In another group of viruses called the potyviruses (the largest and most important group of plant RNA viruses) we discovered a new gene, termed pipo, that is absolutely essential for virus spread within infected plants. Unusually, expression of pipo depends not on ribosomes slipping into an alternative reading frame, but instead on the virus replication enzyme slipping during synthesis of a small percentage of the virus mRNAs. This is a new phenomenon in this type of virus and led to a series of follow-up studies to investigate the molecular mechanism.

One of the ways in which we are investigating virus gene expression is the relatively recently developed technique known as Ribosome Profiling. This technique relies on the fact that a translating ribosome covers around 30 nucleotides of mRNA. Nucleotide sequences that are not thus protected by translating ribosomes can be enzymatically digested, leaving millions of 30-nucleotide-long RNA fragments that can be analyzed using High-Throughput Sequencing technology and then mapped back to virus and host mRNAs, to give a global snapshot of the positions of translating ribosomes. We have been using Ribosome Profiling to study gene expression in murine hepatitis coronavirus (a model for SARS and MERS coronaviruses), avian infectious bronchitis coronavirus, equine torovirus, enteroviruses, human astrovirus, and murine leukemia virus.

A key aspect of our work is to identify and characterize ""hidden genes"" in RNA viruses. We identified and characterized a novel protein (termed UP) encoded in the genomes of most human-infecting enteroviruses including poliovirus type 1, and a novel protein (termed XP) encoded in the genomes of human and other mammalian astroviruses. Very recently, through comparative genomic analysis of SARS-CoV and related coronaviruses, we identified a candidate new gene (termed 3c) that we are currently investigating experimentally."
Identification of the full complement of genes and other functional elements in any virus is crucial to fully understand its molecular biology and guide the development of effective control strategies. Comparative computational analyses can be used to efficiently identify features and target experimental analyses, thus saving time and cost, and reducing the need for animal experiments. We are developing and using sophisticated comparative genomic techniques to systematically discover previously overlooked functional elements in virus genomes. The resulting findings will be an important resource for driving progress in understanding the molecular biology of viruses, particularly for less well-studied or newly emerging virus species. The comparative genomic tools developed have wider applications outside of virology as they can also be applied to cellular organisms.

We are also using targeted laboratory work to further investigate the most significant computational findings. This is leading to advances in our understanding of the biology of particular virus species of medical or economic significance, thus laying the ground work for advances in virus control strategies including the rational design of virus vaccine candidates, and identification of potential targets for antiviral drugs. Due to the unique constraints under which RNA viruses evolve, they have developed a variety of novel gene expression strategies and other unique molecular mechanisms, many of which have applications in biotechnology and as tools for fundamental molecular biology research. One focus of our research is to identify and characterize new such mechanisms. A significant sub-project is using Ribosome Profiling to understand the dynamics of virus gene expression and we have developed new computational tools to refine and interpret the data produced from this technique.
Investigating plant virus gene expression
Growing virus in tissue culture