A key step towards developing effective treatments and/or control strategies for viruses is to first achieve a solid understanding of their molecular biology. To do this, it is crucial to identify and characterize the full complement of genes and other functional elements encoded within their compact genomes. Many of the most important RNA viruses were amongst the first genomes to be sequenced, some 30–40 years ago. Surprisingly, however, "hidden" protein-coding genes are still being discovered even in some of the most well-studied RNA viruses. Such genes tend to be very short and/or embedded within other genes, making them very difficult to detect using conventional gene-finding approaches. Besides protein-coding genes, RNA virus genomes are also packed full of other functional elements – for example the RNA of the genome can fold into complex 3-dimensional structures that form important signals to direct the different aspects of the viral replicative cycle. Knowledge of these functional RNA elements is also far from complete.
Computational analysis of virus genomes provides a practical and cost-effective way forward that can be used to precisely and efficiently target follow-up experimental research. The genome sequences of RNA viruses evolve very rapidly so that there is considerable diversity between different isolates of a single virus species. For medically or economically important species, there are often dozens or even hundreds of sequences available. By comparing the sequences of different virus isolates and computationally analyzing the patterns of changes at different nucleotide positions (a technique known as "comparative genomics"), we can predict novel functional elements and often gain extensive insight into their function. Part of our work involves developing new comparative genomic techniques for virus genome analysis. We are also taking some of the most interesting newly discovered features into the lab to experimentally characterize exactly what their function is during virus infection. By enhancing our understanding of the molecular biology of many virus species, we lay essential groundwork for follow-up advances in diverse virus control strategies.
During the course of this project, we have developed new software and databases, we have computationally identified "hidden" genes and major functional non-coding elements in medically and economically important RNA virus genomes, we have experimentally characterized some of the most interesting computational findings, and we have characterized novel virus gene expression mechanisms.