Community Research and Development Information Service - CORDIS


COMPARE Report Summary

Project ID: 643476

Periodic Reporting for period 1 - COMPARE (COllaborative Management Platform for detection and Analyses of (Re-)emerging and foodborne outbreaks in Europe)

Reporting period: 2014-12-01 to 2016-05-31

Summary of the context and overall objectives of the project

Globally, infectious diseases are the direct cause of about 22% of all human deaths and cause disastrous problems in animal health. In addition to the direct consequences, infectious diseases, including foodborne diseases, also cause a major burden on health systems and may imply restrictions on travel and trade. Due to demographic and other changes, the dynamics of diseases rapidly change and new diseases emerge frequently. Because many infectious diseases are international, rapid global systems for exchange and comparison of information are highly needed.

The novel developments in next generation sequencing (NGS) technologies as well as the increasing ability to exchange data globally using the internet, open up completely novel opportunities. Globally, laboratory diagnostics increasingly rely on pathogen genomic information. RNA / DNA are common across pathogens, therefore, methods to analyse pathogen genomes are potentially universal. The NGS capacity is developing fast, and costs are becoming competitive. Data are easy to share electronically and are in a standardized format. Capturing NGS developments may provide a universal language that can be harnessed for early detection of outbreaks across disciplines and domains, and if the technology keeps developing, less equipped labs may leapfrog, which may allow us to very easily involve developing countries. One of the main barriers is however, the lack of possibilities to easily share data in a standardized format amongst partners as well as publicly. Another barrier is the lack of local expertise for analyzing the data generated or even available standard protocols or frameworks for what to do in given situations.

COMPARE is a multidisciplinary research network that envisions a globally linked data and information sharing platform system for the rapid identification, containment and mitigation of emerging infectious diseases and foodborne outbreaks. The system will collect, process and analyse sequence-based pathogen data in combination with associated (clinical, epidemiological and other) data for the generation of actionable information to relevant authorities and other users in the human health, animal health and food safety domains. COMPARE aims to make as much data as possible publicly available and will thus, contribute significantly to global Open Science.

COMPARE is developing standards for sampling as well as sample handling, in both routine surveillance and in outbreak scenarios, which will allow even less-experienced researchers to perform comparable studies. COMPARE is also developing analytic frameworks as well as online bioinformatics tools, which will give everybody online access to advanced bioinformatics analysis. These standards and frameworks are combined with an IT-platform that will allow people not only access to the analytic pipelines, but also give them the ability to store their data privately (temporarily), share them in closed groups and make them publicly available. The communication aspects, barriers and economical aspects associated with the development of this system will also be addressed. During the first 18-month period of the project, a number of larger (and minor) global infectious disease emergence and transmission events have taken place (Zika, Ebola, influenza, as well as mcr-1 colistin resistance). In all cases, sharing and analysis of sequenced-based information in combination with relevant epidemiological information have helped to elucidate events, but faster sharing and common analysis between all stakeholders would have provided better foundations for public health interventions. These recent events show that the platform COMPARE seeks to deliver is more relevant than ever.

During these first months, COMPARE has developed initial standards or initiated studies for sampling and handling of samples. Initial workflows for clinical diagnostic, food safety and emerging diseases have been developed and pilot projects planned and/or executed. Initial versions of web-accessible sites for sharing and analysing sequence data have been created, and the first attempts to compare analytic pipelines are on their way. The recent global emergencies have been a good basis for studying communication strategies and barriers for sharing, and plans for economic evaluations will follow.

The Specific Objectives of COMPARE include the following:
A. Risk-assessment models and risk-based sampling and data collection strategies that enhance our capacity to detect potential disease outbreaks;
B. From samples and associated metadata to comparable data: harmonised standards for sample processing and sequencing to obtain high quality and comparable sequence data from and metadata associated with a specimen;
C. From comparable data to actionable information: designing analytical workflows for turning comparable data into actionable information for addressing questions in frontline diagnostics, foodborne infections and (re-) emerging infections. “Actionable Information” to take well-informed decisions and actions in pursuit of pathogen identification and characterization, and outbreak detection, investigation, and prediction for:
o Frontline diagnostics
o Public health
o (Re)-emerging infections research
D. Designing and building a common data and information platform supporting rapid sharing, integration and analysis of sequence-based pathogen data in combination with other contextual metadata; The system will be linked to existing and future complementary systems, networks and databases such as those used by ECDC, NCBI and EFSA;
E. Risk communication tools will be developed enabling authorities in the human and animal health and food safety sectors to effectively communicate the results obtained with the new analytical workflows.
F. Studies on the barriers (ethical, regulatory, administrative, logistical, political) to the implementation and widespread use of open-data sharing platforms; and
G. Development of a framework for estimating the cost-effectiveness of the COMPARE system, including the value of safety.

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

The first 18 months of the project have focused on getting started in each of the individual work packages. As the structure of the COMPARE project is different from typical research projects, and requires a shared commitment to goals that help develop the COMPARE infrastructure in addition to the individual scientific ambitions, coordination of the WPs is crucially important. All WPs have spent time and organized teleconferences and meetings to better understand the needs and capacities of each participant, and the potential synergies, overlaps and knowledge / expertise gaps as a start to the organisation of the WP activities. In order to avoid duplication of efforts, substantial work has been done to map what is already available in terms of protocols for sampling (WP 1), storage and sequencing (WP 2), bioinformatics workflows, reference sample sets, reference databases for use in clinical, public health and research questions (WPs 3-5). These inventories all were made in the form of databases that can be accessed and searched by the COMPARE partners, and have been shared through the COMPARE Share Site. In the coming period, work will be done to make these resources accessible and searchable for outside users, as resources to potential future users of COMPARE.

In WP 1, a generic framework for the risk assessment is being developed. The core question addressed is ‘What are the most probable routes of spread for pathogens introduced into Europe, and what are the populations and regions at greatest risk for spread?’ This framework uses data on population densities, climate factors, animal movements, pathogen behavior, etc. and is developed to guide decision-making regarding where to invest in enhanced sampling and targeted studies of etiologies, using the sampling algorithms and protocols developed in WPs 1-5. Whilst the framework is designed to be generic and applicable to a multitude of pathogens, target pathogens will be selected in order to illustrate the utility of the risk model. This will be done using scenarios reflecting the type of problems where COMPARE should be able to provide added value. The choice of scenarios is currently under discussion.

In WP 2, great progress has been made to unravel the factors that determine the quality of sequence data produced from the broad range of sample types for the different organisms (bacteria, viruses, parasites, and resistance genes) and study questions (medical, veterinary, food microbiology, clinical, public health, basic research). In accordance with the results of the survey, experiments have now been planned/initiated.

WP 2 and WP 3 have been working together to address the overarching aspects of the process of going from biological sample material to analysed next generation sequencing (NGS) data, which is conceptually broken down into four component parts: 1) sample pre-processing, 2) NGS (including library preparation), 3) genome assembly or metagenome analysis, 4) analysis and interpretation. It is now clear with the maturing of various NGS platforms that there is restricted ability to alter a given manufacturers’ machine characteristics or to use nonstandard reagent sets for the NGS libraries and machine runs. This has the advantage of giving a level of standardization for the NGS workflows per NGS machine type between laboratories, allows the assessment of machine performance characteristics based on manufacturer specifications and QA/QC metrics and, finally, will ensure a smoother path to regulatory approval of NGS systems within infectious disease diagnostics. This, however, removes an element of choice and innovation from the NGS workflow and focuses the important variables onto sample collection (WP 1), sample pre-processing (WP 2 and WP 3) and sample post-sequence analysis (WPs 3-8). Work is ongoing to further develop standards for the application of NGS across the consortium, including protocols, control reagents and datasets, needed for accreditation purposes.

WPs 3/6 have developed a general framework for application of NGS to routine clinical microbiology, virology and diagnostics. For microbiology, the pilot organisms/problems chosen for the initial phase of COMPARE were E. coli and urinary tract infections, and antimicrobial resistant bacteria relevant for patient care and hospital epidemiology. Here, there is a need for simplifying the technical algorithms utilized for typing and for antibiotic resistance detection to allow people with no or little knowledge about computational analyses to routinely perform the analysis. Two workflows, one to utilize NGS in clinical settings to study the epidemiology and population biology of bacterial clones and one for antibiotic resistance gene identification have been developed. These workflows will now be built into the bioinformatics platform and evaluated further in WP 3/6. On the virus side, work was started to explore clinical application of NGS metagenomics for evaluation of unusual disease syndromes, with emphasis on severe unexplained illness and/or high-risk patients. Protocols and analytical workflows are operational in several COMPARE partner laboratories, and comparative studies are planned and ongoing to understand the strengths and weaknesses of the different approaches. In addition, this expertise was also used to support recent outbreak investigations (Ebola, Zika, emergence of colistin-resistant E. coli).

In WPs 4/7, essential preparatory steps were made to assess the potential added value and challenges of the use of NGS-based techniques in the context of ongoing public health surveillance. For this, a reference genome database was constructed (Deliverable 4.1) consisting of publically available sequences to cover each of the six pilot foodborne pathogens (E. coli, Salmonella, Listeria, norovirus, hepatitis A, Cryptosporidium). These reference genomes were carefully selected to represent the most relevant types to be useful in future sequence-based analysis for outbreak detection and source attribution modelling. This reference genome database was completed in Month 6 and is now publicly available at together with short descriptions of the disease-causing potential and epidemiology of each organism. In addition, detailed plans have been developed to address the crucial questions of performance of NGS in comparison with current reference methods for pathogen detection and typing, and for cluster analysis and source attribution. A series of pilot studies was determined and is currently under development.

In WPs 5/8, the focus is on harnessing the potential added value of NGS for emerging disease detection and research. As an emerging disease outbreak - involving several COMPARE partners - evolved during the first month of the project, it was decided to immediately launch a pilot study on H5N8 avian influenza to compare the analytic pipelines and tools already available by the different partners and to drive the development of data-sharing hubs for rapid deployment in WP 9. The global spread of H5N8 was successfully described and the initial evaluation of the different tools/pipelines clearly shows the need for future benchmarking and standardisation. This model helped drive the further development of data-sharing hubs that were also piloted for Ebola (in collaboration with mobile laboratories in Sierra Leone), for Zika, and for a cross work package metagenomic working group within COMPARE. The hurdles encountered to this model of sharing were inventoried and are studied in the appropriate WPs (technical, ethical, legal, etc.).

In WP 9, key components of the future COMPARE data infrastructure have been delivered as early usable tools in pathogen data sharing. These include data-sharing standards, the first sets of COMPARE Reference Genomes, the first pilot data-sharing hubs mentioned above, data coordination and support, core data-sharing infrastructure, and translation of analytical workflows for commonly used tools into first generation user friendly iPython notebooks. Further components in development include the computational resource (COMPARE-VM), computational analysis workflows and the COMPARE data portal.

In WP 10, it has been possible to align communication with the overall project and establish working collaboration with other WPs. A comprehensive stakeholder inventory and analysis has been carried out, finalized and submitted (Deliverable 10.1). A Weekly Bulletin on Risk Communication & EIDs is being published. A scholarly paper has been published in a world-leading journal devoted to bioethics, and fruitful contacts have been established with consortia and institutions working on health risk communication.

In WP 11, the establishment of the Expert Advisory Panels (EAPs) has been completed. The EAP members have all been briefed on the overall vision for the project and on the structure and work plans by tele-meetings. In preparation of the first annual meeting, the members of the EAPs were asked for initial feedback, comments, concerns, criticism, which was presented to the WP leaders before the annual meeting with the specific request to address these issues during the annual meeting. At the end of this meeting, a feedback session was held with EAP members and WP leaders to provide further guidance and identify possible related activities. Several EAP members indicated to be interested in specific parts of the COMPARE activities and a Share Site was provided for them to access interim reports of the ongoing activities.

In WP 12, the focus is on understanding barriers to the development of COMPARE, in terms of legal, ethical, administrative and other considerations that may play a role. Legal science ‘school’ of the microbial commons is clearly represented through the acceptance of the Barriers EAP membership by Reichmann, Uhlir and Contreras. Connection and collegial consultation on barriers to genomic and public health data sharing has been established with lawyers/legal expertise of ECDC, DG Santé, FAO, and WHO. Interaction on legal barriers with EFSA and OIE are yet to be included. For a review of documents on topical global treaties of interest (CBD/Nagoya Protocol, Australia group convention, PIP Framework), an ad hoc collegial group can easily be consulted of (legal, policy and ethics) colleagues from the USA FDA, CDC and NIH, Rep. of South Africa, Senegal, China, Singapore, Thailand, Canada and several EU partners.

In WP 13, the key element is that COMPARE has a functioning public website to share results with stakeholders and the public. There is also a closed web site for project partners to share documents and develop deliverables and reports. COMPARE participants have been active to disseminate information about COMPARE as well as results from the first 18 months of work via presentations at conferences, peer-reviewed scientific journals and general public media (news articles and television).

In WP 14, the important elements in calculating costs and benefits of COMPARE and related methods and tools have been identified (Deliverable 14.1). The cost-effectiveness study accompanying the COMPARE project aims to weigh these costs against the benefits of the system, by quantifying them in monetary terms or through the use of one or more relevant units of effectiveness. To this end, both inter-epidemic periods ('peace-time') and times of outbreak of relevant pathogens will be considered, applying a mix of retrospective and prospective approaches. Possible case studies include the 2014 outbreak of the Ebola virus in Western Africa, the outbreak of highly pathogenic avian influenza H5N8 virus in 2014, the 2011 outbreak of Shiga toxin-producing E. coli in Germany, and more general themes such as antimicrobial resistance and the transmission of hospital infections.

In WP 15, the appropriate organizational structures and processes have been put in place to respond to the EC’s as well as partners’ needs and to ensure COMPARE’s compliance with the EC Grant Agreement and the COMPARE Consortium Agreement.

During its next phase, COMPARE will focus on starting to link the building blocks that are emerging from the individual work packages into the developing ICT and analysis framework developed in WP 9. To stimulate this, cross work package activities have been identified at the annual meeting and through the review of the individual work package reports. The management team discusses potential synergies and overlaps on a regular basis with the WP leaders and individual partners. In addition, we will prepare for an initial pilot study to increase the interaction with the global research and public community. Important steps here will be to ensure the parallel development of collaborative research studies where data are shared in closed private environments and “open science” where raw data and analysis generated by COMPARE partners are immediately made available for both researchers globally and the general public. This latter aspect is expected to provide valuable scientific input from the global research community but also important lessons for our attempts to build easy data-sharing infrastructures as well as learn about the barriers to sharing.

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

From the initial feedback within the consortium and during presentations outside the consortium, it has become clear that there is a great demand for a COMPARE-like infrastructure, database and tools. While planned for completion by the end of the 5-year project, the data-sharing hubs are in popular demand and that work has been advanced, using some outbreaks in which partners in the COMPARE project were involved as pilot material. Current emphasis is on combining the data-sharing hubs with the possibility to add tools (analytical workflows) so that data can be analysed in the COMPARE compute environment, rather than on local computers often with insufficient capacity. This set-up also was piloted for Ebola, and is currently discussed for Zika.

A cross-WP project was identified to test all the currently available first elements of COMPARE: reference databases, the core infrastructure, and workflows for microbiome and virome annotation and characterization. For this, a global sewage sampling snapshot was initiated, for which the sampling has been done on the basis of other funding sources, but where the currently agreed protocols for (microbiome and virome) metagenomics are being used to provide an in depth global snapshot of diversity. Thus, we have executed sampling of sewage from 78 cities in 63 countries, which is considerably larger data generation than expected in the consortium. This will also be used to facilitate integration between the different WPs, especially those looking at analytic pipelines (WPs 3-8) and the IT-infrastructure WP 9.

In addition to this we have also realized the increasing need for conducting such studies in an “Open science” way. Thus, we have decided to use Copenhagen sewage as a test case for sampling, sequencing and immediate release of all data into the public domain. This is expected to be an important lesson for us on the possibilities and importance of doing research completely online and transparent as well as a way of interacting with the global research community and the public.

Related information

Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top