Skip to main content

Integrated human data repositories for infectious disease-related international cohorts to foster personalized medicine approaches to infectious disease research

Periodic Reporting for period 2 - RECODID (Integrated human data repositories for infectious disease-related international cohorts to foster personalized medicine approaches to infectious disease research)

Reporting period: 2020-07-01 to 2021-12-31

ReCoDID includes a multidisciplinary team of researchers from the global infectious disease arena, leveraging existing infrastructures and partnerships to develop a sustainable model for the storage, curation and analyses of the complex data sets collected from infectious disease-related cohorts.

While infectious-disease related cohorts collect both data from (i) participant interviews, clinical assessment, and related geospatial, social, and environmental exposures (hereafter named clinic-epidemiological data, CE data), and (ii) high-dimensional data from advanced laboratory analysis on clinical samples (HDL data or OMICS data), a system is needed to facilitate the synthesis and analysis of these data, which are typically stored in separate repositories, within and across cohorts.

Combining data repositories across cohorts for infectious disease research is rare. However, significant investments have been made in sharing CE and HDL data from population-based registries in high- income countries to improve personalised medicine, especially in the fields of chronic and rare diseases.

Combining data repositories across cohorts for CE and HDL data, the overarching goal of this project is to develop an integrated, sustainable platform for the sharing, synthesis, and analysis from infectious disease cohorts in keeping with the principles of FAIR (Findable, Accessible, Interoperable, and Reusable) data-driven research. The project will also develop biostatistical methods needed for the analysis of pooled cohort data. The repository platform will be equipped with a tiered permission system for data access. Cohort-specific hubs will facilitate cohorts’ analyses of their own data and cross-cohort analyses within a clearly elaborated legal, ethical, and equitable framework for cross-study data sharing.
The first year was characterized by the implementation of the work flows for data harmonization and platform pipeline building. Success stories during the first reporting period included the lively exchange between the EU-CAN funded ‘sister’ projects with the establishment of cross-consortial working groups and newsletters. In 2020, RECODID was one of the few projects selected for top-up funding by the European Commission because of its unique focus on data sharing and harmonization of infectious-disease related cohorts. Partners generated a harmonized master variable dictionary for retrospective harmonization of acute febrile illness cohorts and published manuscripts on measurement error and causal inference in pooled cohorts.

Making progress during the second reporting period (RP2) was not easy. The COVID-19 pandemic shifted priorities and schedules of all researchers involved causing delays. For this reason, we will apply for a 1 year no-cost extension. The impact of challenges around the legal aspects of data sharing (considering GDPR legislations) became a focus during RP2. These discussions gave us insights into GDPR, its implementation in research and data sharing, how this ultimately factors into a project’s processes. Operationally, efforts centered around identifying and defining data flow routines for both clinical-epidemiological (CE) and high dimensional laboratory (HDL) datasets. A greater understanding of integration with data models at EMBL-EBI led to the expansion of the data hubs for CE datasets. Including a combination of data sharing at the European Genome-phenome Archive, BioStudies, ArrayExpress and the Cohort Browser at EMBL-EBI. The cohort cloud was set up within the RP2 and awaits datasets to drive its requirements.
Towards the end of RP2, ReCoDID is in advanced negotiations with the partner Erasmus University in the Netherlands as well as with the ORCHESTRA project and the French National Research Institute (INSERM) to negotiate data processing agreements (DPA). Great progress was made in biostatistical methods development for pooled cohort data sets.

ReCoDID has been involved in many stakeholder interactions about data sharing and harmonization, both in the ethical/legal space as well as in the technical area. ReCoDID has become a known player in data sharing across Europe, contributing to several thematic streams with competence, among them data standards, central vs. federated analysis strategies, data sharing and harmonization, and federated biorepositories.
The work in ReCoDID was affected and delayed by the COVID-19 pandemic, but at the same time the project proved to be more relevant than ever, which is reflected in the fact that ReCoDID was selected for additional funding, adding a whole work package on „COVID-19 research response“ (WP8). Subsequently, ReCoDID was featured in a number of high-level consultations and presented at respective meetings (see dissemination) and has been included as a project ‚to collaborate with‘ in several new EC calls for proposals. As a result, ReCoDID is well represented in:
- ECRAID (UKHD as lead of WP9 on ‚Data‘, and within WP9 closely working with ECRIN),
- ORCHESTRA (with the cohorts in Latin America pivoting from Zika to COVID-19 and providing additional data to ORCHESTRA),
- UNCOVER (the ReCoDID coordinator on the Advisory Board),
- SYNCHROS, and recently also
- END-VOC (UKHD lead of WP2, aiming at using the ReCoDID data sharing pipeline for cohorts in END-VOC).

During 2019 and 2020, ReCoDID actively shaped the agenda of the group of EU-CAN consortia. Several cross-consortional working groups were created, among them one on data harmonization. In the following years, even without having organized any in-person stakeholder meetings (see report WP6 and deviations from Annex 1), ReCoDID contributed to many workshops with stakeholders about data sharing and data standards. A virtual meeting was organized, convening all the relevant data standardization groups (i.e. CDISC/CDASH-SDTM, OMOP, etc.) at the beginning of the third reporting period (March 2022) by ReCODID (with the help of partner 16 Maelstrom) in order to discuss a way forward for meta-harmonization.
With all this groundwork accomplished (and much of it extended beyond the initial workplan as we progressed into „uncharted territory“), ReCoDID is well positioned to convene stakeholder meetings in 2022/2023 on a) data harmonization/data sharing, b) virtual biorepositories, providing new avenues into c) sustainable cohort research platforms. However, in order for this to come to fruition, we will need a one-year no-cost extension of the project.

Within the ethical/legal work package (WP2), the group decided to embark on additional work beyond the initial aims of the project, and carry out cognitive interviews about the participants‘ perception of broad informed consent in research.
Furthermore, the project is on track following a broader vision of federated repositories for biological samples – a topic that has gained substantial traction since the time the projects were evaluated and selected.

The legal challenges experienced deepened the understanding of the interpretation of the GDPR legislation across institutions and EU member as well as non-member countries. The project has actively sought for solutions, also involving the legal team at the European Commission. ReCoDID researchers have found a solution for data sharing between institutions within the life time of the project, while at the same time being involved in discussions that stretch well beyond the projected end of the project.
ReCoDID Work Packages, including new 'WP 8 on COVID-19 Research Response'