Periodic Report Summary 1 - PATHONGEN-TRACE (Next Generation Genome Based High Resolution Tracing of Pathogens)
Project Context and Objectives:
Next Generation Sequencing (NGS) has fundamentally altered genomic research. The rapid development of this technology has enhanced performance and reduced DNA sequencing costs, widening the spectrum of possible cost-effective applications. The high potential for ultra fast and accurate molecular typing and diagnostics that NGS promises, is being exploited in the medical diagnosis of pathogens. In this context, PathoNGen-Trace (PNGT) aims to develop Next Generation Sequencing and next generation DNA analysis tools into a highly efficient and effective technology for diagnostics and the high-resolution typing of microbial pathogens. An international consortium of leading experts in the field of clinical microbiology, such as the ‘Westfälische Wilhelms-University’ Muenster, the University of Oxford, and the Research Center Borstel (scientific project coordinator), are working together in the project with well-known commercial enterprises in this area - Applied Maths NV, Genoscreen SAS, Piext BV, and Ridom GmbH.
Three pathogens are being used as models – Mycobacterium tuberculosis complex, methicillin-resistant Staphylococcus aureus (MRSA), and human-pathogenic Campylobacter species. All three pathogens are major pathogens worldwide, posing serious medical threats and important challenges for treatment and public health. One third of the world’s population is currently infected with tuberculosis (TB), with new infections occurring at a rate of about one per second globally. Although it remains latent in most cases, over nine million cases of active TB were reported in 2009, with about 1.7 million fatal cases. MRSA is the major cause of hospital-acquired infections, often affecting surgical intensive care units, burn centres, maternity wards and special care baby units. Finally, the pathogens of the genus Campylobacter (C. jejuni and C. coli) can be transmitted from animals to humans especially but not exclusively in retail food products. In Europe they are one of the main causes of bacterial intestinal infections, rising above the number of Salmonella infections for the first time in 2005. While classical genotyping has been applied to understand transmission and population structure of theses pathogens, recent whole genome investigations showed these approaches lack discriminatory power as they do not reflect the full level of genome diversity.
The PathoNGen-Trace project aims to overcome drawbacks in conventional typing and diagnosis with NGS-based genotyping for discrimination of clinical isolates at genome wide level. In parallel to the development of new tools, the consortium will also determine the as yet unknown quantitative parameters of genome evolution on a population basis, in order to calibrate and validate NGS for pathogen genotyping and epidemiological tracing. The main objectives of our proposal are: (i) to develop new, completely integrated bioinformatics tools for fast and easy quality-controlled data extraction and interpretation for general diagnostics (e.g. drug-resistance) and public health applications (genomics-based molecular epidemiology); (ii) to streamline and implement new internal quality control procedures of the whole NGS process, from sample preparation protocols to final sequence assemblies or mapping; and (iii) to test and validate the performances of NGS for ultra-sensitive/early diagnostics and to monitor the spread of major microbial pathogens.
The major objectives for the first phase of this project were (i) the successful start of the project in all work packages, (ii) the implementation of the data warehouse and first software pipelines, (iii) starting strain collection and sequencing of the three different pathogens, and (iv) starting comparison of different NGS platforms and development of improved NGS kits.
The major results of the five scientific work packages are:
BIGSdb, the PNGT data warehouse has been installed using the Microbial Typing Ontology, providing access to the data and facilitating wgMLST-type analyses. Different bioinformatics pipelines have been tested for comparing performances on genome assembly/mapping and SNP/InDel detection. A bacterial whole core genome definer and genome wide gene-by-gene typing method was developed, integrated and tested on MTBC, MRSA and CAMPY into the SeqSphere software. The Bionumerics software was extended with the possibility to analyze short read sets. SNP- and K-mer-based clustering methods have been evaluated and different algorithms have been developed and tested for e.g. reference-based wgSNP and MLST analyses and assembly free typing methods.
Work performed in WP2 consisted in the assessment of the sequencing platform and sample preparation as well as the set-up of quality control. Comparison of benchtop platforms led to the selection of the MiSeq platform. Nextera XT kit has been selected as library preparation kits. The quality control system includes a selected control strain and check points at different relevant steps. High multiplexing levels have been set to 96 for CAMPY, and 48 for MRSA and MTB. These optimized settings have been used for the sequencing of 988 isolates. Evaluation of available WGA processes and general capture technologies led to unsatisfactory results.
As a key result, the potential of whole genome analysis (WGS) for improved molecular-guided tuberculosis (TB) surveillance could be demonstrated by analysing a longitudinal M. tuberculosis (Mtb) outbreak (86 strains over a period of more than 10 years) with next generation sequencing. This study established first parameters for WGS based pathogen tracing and micro-epidemics studies. Genome sequencing of a large retrospective collection from a longitudinal molecular epidemiological study was started and protocols for Bench Top sequencing (MiSeq and Ion Torrent) have been established. Finally, a first prototype for a genome wide gene-by-gene approach using the Ridom SeqSphere was developed.
To date the most significant results of WP4 are the comparisons of bench-top NGS machines performed by GSR and WWU (Nature Biotechnology 31: 294, 2013) for S. aureus, M. tuberculosis and E. coli that allows the consortium to choose the most appropriate platform for future work. Furthermore, de novo assemblers are currently systematically evaluated by WWU. This comparison is due to be submitted for publication within the next 1-2 months. Finally, a retrospective MRSA outbreak analysis has been conducted by WWU that will be submitted for publication by the end of August 2013.
A rapid high-throughput sequencing pipeline has been developed for the generation, assembly, analysis and dissemination of Campylobacter whole genome sequence (WGS) data and used to characterise large numbers of isolates from a well-defined isolate collection. On-going validation of the data from the 561 isolates investigated to date suggest very high levels of accuracy and effectiveness. There is good concordance of WGS data with optical mapping technologies, although the latter are most effective in determining genome rearrangements and are relatively insensitive for the detection of nucleotide sequence changes.
PathoNGen-Trace will foster the development of new and widespread applications of Next Generation Sequencing (NGS) in clinical microbiology and disease surveillance, ranging from basic research to medical research, diagnostics, and pathogen genotyping. The NGS kits, methodologies, and software developed for highly effective diagnostics and genotyping of major pathogens will directly impact patient management and disease surveillance, and thus reduce health-related costs. In line with foreseen future progress in NGS technologies, this project will promote a major technological shift towards the replacement of the current capillary sequencing by NGS for these applications.
The development of new tools/technologies in this SME-targeted project will overcome existing obstacles for large-scale use of NGS by European clinical microbiologists and scientists, thus fostering competitiveness of Europe in biomedical applications. Importantly, the tools will be developed under formats as generic as possible, to be applicable to a wide variety of micro-organisms. In particular, the proposed new NGS research tools will significantly enhance data generation (e.g. better NGS workflow, new technologies), improve standardisation (ontology, API and kits), quality control (algorithms), and analysis (new bioinformatics tools). Moreover, the development of the necessary tools for a much wider application of NGS technology will enable the generation of an enormous amount of new knowledge in the European region, and is crucial for increasing the competitiveness of Europe in the areas of "-omics" research and systems biology. Beyond these new key applications and validation of NGS for medical microbiology, we anticipate that the tools developed will also be usable for other relevant NGS-based applications such as comprehensive microbial genomic characterization for bio-banking, bio-processing and synthetic biology.
The tools and knowledge developed by PathoNGen-Trace will bring modern molecular surveillance to the outmost ultimate level of resolution by using whole genome data for pathogen tracing. It will provide software solutions for easy quality-controlled strain classification based on NGS data. Rule-based systems will assist to extract information relevant for clinical microbiology e.g. resistance or virulence determinants. Based on and linked to existing strain typing platforms (e.g. PubMLST, SpaServer) quality controlled nomenclature servers for NGS pathogen tracing will be established. Due to development of generic tools and databases, PathoNGen-Trace is likely to continue and extend the success of current typing databases (PubMLST, SpaServer) and link it to future European surveillance programs from the European Center for Disease Prevention and Control (ECDC).
Development of such new NGS tools for studying population structure/genome evolution of a variety of pathogens by a wide user community will generate an explosion of knowledge on transmission dynamics, population structure and genome evolution. This will promote improved European disease control measures and a better understanding of virulence and resistance traits of major pathogens.
Therefore, the added value for European citizens directly resulting from the project results will be: (i) more comprehensive detection (and therefore better treatment) of relevant pathogen infection parameters at patient level (ii), more specific and sensitive early warning microbial outbreak detection at public health level and therefore more effective combating and prevention of pathogen epidemics (esp. multi-resistant ones). Thus, PathoNGen-Trace is likely to create a significant long lasting "added value" in the European health and bio-industrial research area.
List of Websites: