Skip to main content

Virus discovery and epidemic tracing from high throughput metagenomic sequencing

Periodic Reporting for period 2 - VIROGENESIS (Virus discovery and epidemic tracing from high throughput metagenomic sequencing)

Reporting period: 2016-12-01 to 2018-05-31

VIROGENESIS addresses core challenges of advancing methodology that maximizes the use of NGS data in biomedical and clinical settings. However, NGS full potential for routine use in clinical and health laboratories is hampered as current methodology is not sufficiently adapted to these data. The VIROGENESIS project has identified specific bioinformatics bottlenecks that prevent the effective use of NGS in clinical and epidemiological settings. 1) NGS datasets typically contain known and unknown RNA and DNA viruses, but current virus detection methods lack the necessary sensitivity to discover novel viruses, and unassigned sequence reads are currently discarded from analyses. New metagenomics classifiers are needed that can handle the large size of datasets, characterize unknown viruses and show increased classification resolution for known viruses. 2) When virome data has been assembled and classified, phylogenetic investigations provide information on virus dynamics during outbreaks when source and mode of transmission must be identified and the geographic spread quickly mapped. Current inference models are not adapted to work with large and incompletely assembled NGS data. New algorithms are needed that encompass methodological improvements in terms of data size, data format and speed of analysis. 3) Software that can efficiently present the wealth of results from NGS analyses is lacking, with implications for translation of complex, diverse information into ready-to-use data for clinical researchers and public health officials. Novel visualisation should scale complex and diverse information and handle uncertainty present in the output of analyses.

As a final result, the VIROGENESIS project has delivered novel methodology and software solutions for NGS virome analysis, ranging from fast and accurate assembly and identification of known and unknown viral pathogens to modelling and inferring their spread in time and space. These developments have been accompanied by interactive and informative visualisations and dynamic graphical user interfaces. VIROGENESIS tools have been designed to be versatile and user-friendly, and proven to outperform other existing and well-known methods in terms of sensitivity and accuracy, based on a range of simulated and real genomic data. The software developed offers scalability to work with large datasets and allows user exploitation on a routine basis. The VIROGENESIS project aimed at maximizing the concept of open source software, which is also easy to install and adapt, accompanied with detailed documentation and which can run on the most popular operating systems by their platform-independence. Therefore, these characteristics of VIROGENESIS tools support the objective of becoming standard tools for virome analysis in this rapidly evolving research field. In addition, novel opportunities for innovation and commercialization of bioinformatics products have been established.
First, at the beginning, VIROGENESIS aimed at developing software applications to foster bioinformatics analysis in clinical and epidemiological settings. VIROGENESIS has successfully contributed to the acceleration of the translation of NGS analysis results into clinical research, particularly with respect to virus discovery, identification and surveillance. Novel and fast NGS metagenomic assembly methods and virome comparison tools were developed, addressing the current lack of NGS metagenomics to characterize the 'unknown'. The tools developed with VIROGENESIS have increased precision and recall of classifications which helps in disambiguating the 'unknown' viral matter facilitating discovery and assembly of new organisms. Another important result from the VIROGENESIS project was the finding that Bayesian phylogenetic inference can be successfully used on high-throughput sequencing data. The current framework and further improvements provide, for the first time, the ability to use high precision, computationally robust Bayesian analyses on high-throughput sequencing data, improving the ability to detect the origin and spread of rapidly-evolving viruses. In addition, fast and accurate reconstruction of ancestral histories is possible for large datasets, which allows use in daily research practice of public health researchers. Software has been made open source, to be accessible, extendable and platform-independent, so their wide use and dissemination is maximized. All software has been accompanied by scientific publications describing methodology and results of testing and validation.

Second, the organisation of international workshops in Europe where a wide range of VIROGENESIS software will be used in theoretical and practical sessions strengthen the role of Europe in developing novel and innovative solutions addressing persistent challenges of NGS virome analyses. This is not only demonstrated by the impact of VIROGENESIS publications, but also by the interactions established with external partners, and with other research initiatives. Novel interactions were established to transfer knowledge and technologies from other research fields, e.g. geospatial situational awareness, into the field of epidemic tracking in order to create new opportunities.

Third, VIROGENESIS has facilitated research and innovation opportunities for SMEs as well as the commercialization of bioinformatics products. In addition to the impact initially foreseen, recent advances in VIROGENESIS technology and recent discoveries within the project have given rise to new applications, with new innovations filed in a European and US patent applications. Next, novel business opportunities for SMEs have been identified by transferring and adapting VIROGENESIS software for analyses of bacteria. By partnering with existing analysis platforms with a wide user base, VIROGENESIS will not have only guaranteed long life of the VIROGENESIS tools, but also help the open source bioinformatics suite to grow and be one of the top players as a commercial product.

Fourth, by teaching students from all over the world attending the training sessions of the highly regarded VEME workshops, these students can bring back knowledge of these tools to their own institutions, thus promoting global use. Furthermore, new methods and tools have been disseminated via a large collection of scientific publications. A VIROGENESIS pipeline tool has been implemented in the UGENE platform which will allow wide dissemination of our tool, including to researchers who computational resources and/or programming skills are modest.
The project was positioned at the ‘idea of application’ stage but prototypes have evolved to complete ‘proof-of-concept’ applications. The impact of VIROGENESIS have become visible to date. The uptake of VIROGENESIS software solutions into the editions of International Bioinformatics Workshop on Virus Evolution and Molecular Epidemiology will further enable young and established researchers, virologists and clinicians to keep up with the latest trends in the field of infectious diseases, and of pathogen tracking and detection. Other tools developed within the project will gain their place in the research field of virome analyses, as their good performance has been shown based on simulated and real genomic data.