Integrated human data repositories for infectious disease-related international cohorts to foster personalized medicine approaches to infectious disease research


Launch of unified infectious disease-related cohort data portal

"Task 5.2 : Unified infectious disease-related cohort data portal We will implement a unified data portal providing a single point of access for researchers to comprehensive infectious disease cohorts - from the Consortium and beyond - combining detailed descriptions and direct access to datasets held in the underlying data repositories European Nucleotide Archive (ENA) and European Genome-phenome Archive (EGA), and the respective connected cohort data hubs built upon these repositories. Following an initial back-fill of the portal, new datasets from infectious disease-related cohorts will be automatically identified in each archive or data hub, catalogued in the portal's back-end database and rapidly displayed both through the intuitive web portal interface and well documented programmatic API interface. This integration of data from multiple archive locations has been successfully implemented in previous large projects such as the European Virus Archive ( or more recently the HipSci ( that provide extensive metadata, clear visualisation of the available datasets from a range of archive locations and direct access to underlying data in each archive, including support for batch processing and information on applying for access to managed datasets. For this task we will specifically reuse technical components from the HipSci data portal, developed at EMBL-EBI."

Website of the project

Task 7.4 : External and internal communication/ Website of the project and dissemination via Twitter and Facebook An interactive website will be created and will be available for communication and dissemination of results. This will include extra- and intranet resources. Within the intranet, functionalities will include the possibility to find all the documentation needed for an efficient management of the project, alongside the possibility to share useful documents with the members of the other WPs. This platform- accessible only to the project members- will provide a single starting point to access internal and external resources such as partners contacts and mailing lists, administrative documents (templates, budget plan, calendar, reporting tools), internal procedures, communication tools (logos, brochures, etc.), meetings minutes and useful links. Dissemination of the final results of the project will be detailed by a dissemination plan, which will also lead into the final consortium meeting where we will combine the final stakeholder conference with a dissemination of the results for the scientific public. With regards to external communication, the (public) website will host the main content related to the project such as information about the consortium and partners involved, main objectives, latest news and scientific advances in the field. The project's outputs will be also disseminated within the scientific community and other external stakeholders through several social media, such as Twitter and Facebook.

Report on the 1st round online survey and interviews related to perceived benefits and risks of sharing among cohort investigators

Task 2.1 : Elaboration of steps needed for cohorts to participate in collaborative, decentralized platform – Perceived risk versus benefits of sharing CE and HDL data in cloud-based, federated repository This task aims to set the scene for the establishment of collaborative and decentralized data sharing platforms. While sharing CE and HDL data is supported and sometimes required by major funding agencies, journals, and other stakeholders, cohort study staff and leadership oftentimes do not see data sharing as a net benefit for themselves and their research agenda. Data sharing concerns related to intellectual property, ownership, authorship, and regulatory barriers need to be fully understood and addressed prospectively. Therefore, the focus of this task is to understand the perspective of the potential participants – individual cohort study staff and investigators. We will develop an assessment tool to elicit input and responses from a broad audience of cohort investigators that range from those that are motivated to share all of their data to those who are unwilling to share CE and/or HDL data. We will focus our assessment on understanding: (i) the perceived risks vs. benefits of sharing data in a cloud-based, federated repository; (ii) the perceived need for automated work flows that provide regular analyses of CE and HDL data and how these can be incorporated into the shared platform; (iii) the perceived need for collaborative analyses that leverage human HDL data for personalized medicine and how these can be facilitated though the shared platform; (iv) the perceived need for research capacity building in management of CE and HDL data and analysis; (v) the perceived need and strategies for sensitizing cohort staff to the utility of new forms of high dimensional data for personalized medicine approaches (i.e. microbiome data). We will use an online survey to collect responses and allow for repeated input during the course of the research project. This will be complemented with in-person meetings and especially with one stakeholder conference (see WP6). This task will be managed back-to back with Task 5.1 dedicated to implementation of a Decentralized Organization. Both tasks will undoubtedly partially overlap, but it is critical to have these issues addressed at both a global unifying level and in the context of the local implementation of an adapted governance to overcome obstacles arising at the local level. To help in this integration process, the same participants will participate in both tasks.

Data management plan

Task 7.3: Data Management Plan and plan for dissemination of results The data management plan is part of the initiative for open science and as a mandatory deliverable will be coordinated from WP7, but in close collaboration with WP2-5 as the data generated in the project will emerge actually more in the form of a product (the searchable platform for EC and HDL data, with a federated data repository in the background).

