Skip to main content

Systematic discovery of functional elements in RNA virus genomes: an Encyclopedia of RNA Virus Elements

Periodic Reporting for period 2 - ERVE (Systematic discovery of functional elements in RNA virus genomes: an Encyclopedia of RNA Virus Elements)

Reporting period: 2017-03-01 to 2018-08-31

"RNA viruses have genomes composed of RNA instead of the DNA that is used by all cellular life forms from bacteria to humankind. After gaining entry into a host cell, they hijack the host's protein synthesis machinery to express the viral genes, producing replication proteins (that replicate the viral genome), capsid proteins (that encapsidate new copies of the viral genome to allow transfer between cells or individuals), and various accessory proteins (e.g. inhibitors of host immunity). With the notable exception of smallpox virus, the majority of viruses with the potential to cause acute fatal disease in healthy adult humans are RNA viruses. Such viruses include influenza A virus, Ebola virus, rabies virus, SARS virus, MERS virus, Japanese encephalitis virus, yellow fever virus and dengue virus. Many other human pathogenic viruses are RNA viruses, including Zika virus, poliovirus, hepatitis A virus, hepatitis C virus, rubella virus, hepatitis E virus, chikungunya virus, norovirus, mumps virus and measles virus. RNA viruses also include important pathogens of livestock (e.g. bluetongue virus, foot and mouth disease virus, and Schmallenberg virus). Further, the majority of plant viruses are RNA viruses. The combined impact of RNA viruses, both in terms of economics and human health burden, is immense.

A key step towards developing effective treatments and/or control strategies for human, livestock and crop viruses is to first achieve a solid understanding of their molecular biology. To do this, it is crucial to identify and characterize the full complement of genes and other functional elements encoded within their genomes. RNA viruses have very compact genomes, typically comprising between 2000 and 32000 nucleotides - the fundamental ""letters"" of the information encoded in genomes. By comparison, the human genome contains some 3 billion letters. Many of the most important RNA viruses were amongst the first genomes to be sequenced, some 30 years ago. Surprisingly, however, 'hidden' protein-coding genes are still being discovered even in the most well-studied RNA viruses. Such genes tend to be very short and/or embedded within other genes, making them very difficult to detect using conventional gene-finding approaches. Besides protein-coding genes, RNA virus genomes are also packed full of other functional elements - for example the RNA of the genome can fold up into complex 3-dimensional structures that form important signals to direct the different aspects of the viral replicative cycle, such as control of viral gene expression, replication of the viral genome, and packaging of the viral genome into capsids. Knowledge of these functional RNA elements is also far from complete. When one considers the several hundred RNA virus species that are of medical, veterinary or agricultural significance, it would be a nearly impossible task to map out all of the functional elements in the genomes of every virus species using traditional lab-based approaches.

Fortunately, computational analysis of virus genomes provides a very practical and cost-effective way forward that can be used to precisely and efficiently target follow-up experimental research. The genome sequences of RNA viruses evolve very rapidly so that there is considerable diversity between different isolates or strains of a single virus species. For medically or economically important species, there are often dozens or even hundreds of sequences available in public sequence repositories such as the GenBank database maintained by the U.S.A. National Center for Biotechnology Information. By comparing the sequences of different virus isolates and computationally analysing the patterns of changes at different nucleotide positions (a technique known as ""comparative genomics""), we can predict novel functional elements and often gain extensive insight into their function. Part of our work involves developing new comparative genomic techniques specifically tailored for the unique"
"The overarching goal of the project is to leverage the vast quantity of virus genome sequencing data available, using state-of-the-art computational techniques, to systematically identify all of the functional elements in RNA virus genomes. The computational analysis is coupled with laboratory follow-up work to fully characterize the most important computational findings, ranging from novel features in medically or economically important viruses, to novel molecular mechanisms with potential biotechnology applications. A major focus of the project is understanding the many unusual mechanisms that RNA viruses use to express their genes.

One such mechanism is ""ribosomal frameshifting"". The ribosome is the core molecular machine responsible for translating the nucleotide sequences of genes into amino acid sequences according to the genetic code, where each successive triplet of nucleotides (called a codon) is decoded into one of twenty possible amino acids - the building blocks of proteins. Occasionally, a single nucleotide sequence can encode two completely different amino acid sequences depending on the ""reading frame"" in which the successive nucleotide triplets are decoded. This is actually a rather common occurrence in RNA virus genomes due to evolutionary pressure to compact as much information as possible into their very small genomes. Ribosomal frameshifting is mechanism by which a proportion of ribosomes translating one reading frame are actively stimulated to move into one of the other reading frames to produce a ""hybrid"" protein. Frameshifting can be used to produce a fixed expression ratio, e.g. the essential Gag and Gag-Pol proteins of HIV, where maintaining a specific ratio of Gag to Gag-Pol is important for efficient virus replication. Recently we discovered that some cases of virus frameshifting can be stimulated by a virus protein binding to the nucleotide sequence in front of the translating ribosome and ""knocking"" it into a different reading frame. This allows the virus to modulate the efficiency of frameshifting as the amount of virus protein changes over time in an infected cell and thus regulate virus gene expression.

In another group of viruses called the potyviruses (actually the largest and most important group of plant RNA viruses) we discovered a new gene, called pipo, encoded in an alternative reading frame that is absolutely essential for virus spread within infected plants. Unusually, expression of pipo depends not on ribosomes slipping into an alternative reading frame, but instead on the virus replication enzyme slipping during synthesis of a small percentage of copies of the virus nucleotide sequence. This is a new phenomenon in this type of virus and led to a whole host of questions regarding the molecular mechanisms involved in RNA synthesis slippage which we investigated extensively using Turnip mosaic virus as a model system.

One of the ways in which we are investigating virus gene expression is the relatively recently developed technique known as Ribosome Profiling. This technique relies on the fact that a translating ribosome covers around 30 consecutive nucleotides. Nucleotide sequences that are not thus protected by translating ribosomes can be enzymatically digested, leaving millions of 30-nucleotide-long RNA fragments that can be analyzed using High-Throughput Sequencing technology and then mapped back to the known virus (and host organism) RNA sequences, to give a global snapshot of the positions of translating ribosomes. Ribosome Profiling has proven to be increasingly valuable in studies of protein synthesis, for example in the discovery of novel genes, determination of gene expression levels, identification of translation regulation, and analysis of host response to infectious diseases, differentiation and development, and cell stress. We have been using Ribosome Profiling to analyse the dynamics of gene expression during viral infection. In work already published, we used Ribosome"
Identification of the full complement of genes and other functional elements in any virus is crucial to fully understand its molecular biology and guide the development of effective control strategies. Comparative computational analyses can be used to efficiently identify features and target experimental analyses, thus saving time and cost, and reducing the need for animal experiments. We are developing and using sophisticated comparative genomic techniques to systematically discover previously overlooked functional elements in the genomes of viruses of medical or economic importance. With the massive amounts of RNA sequencing information now becoming available, for the first time it is possible to map out at high resolution functional elements genome-wide in hundreds of important RNA virus species. The resulting database will be an important resource for driving progress in understanding the molecular biology of viruses, particularly for less well-studied virus species. For example, it can be used to inform design of attenuated virus vaccine candidates. The comparative genomic tools developed have wider applications outside of virology as most can also be applied to cellular organisms.

We are also using targeted laboratory work to further investigate the most significant computational findings. This is leading to advances in our understanding of the biology of particular virus species of medical or economic significance, thus laying the ground work for advances in virus control strategies including the rational design of virus vaccine candidates, identification of potential targets for antiviral drugs and, for plant viruses, potential leads for breeding or engineering virus resistance in crop plants. Due to the unique constraints under which RNA viruses evolve, they have developed a variety of novel gene expression strategies and other unique molecular mechanisms, many of which have applications in biotechnology and as tools for fundamental molecular biology research. One focus of our research is to identify and characterize new such mechanisms. A significant sub-project is using Ribosome Profiling to understand the dynamics of virus gene expression and cellular responses to viral infection and we have developed new computational tools to refine and interpret the data produced by this technique and to improve the experimental strategy.
Investigating plant virus gene expression
Growing virus in tissue culture