Skip to main content
European Commission logo
English English
CORDIS - EU research results

A federated FAIR platform enabling large-scale analysis of high-value cohort data connecting Europe and Canada in personalized health

Periodic Reporting for period 3 - EUCAN-Connect (A federated FAIR platform enabling large-scale analysis of high-value cohort data connecting Europe and Canada in personalized health)

Reporting period: 2022-01-01 to 2023-06-30

EUCAN-connect is motivated by the promise of personalized prevention and healthcare using rich phenotypic, environmental and molecular (omics) profiles of every individual. To fulfill this promise researchers need access to data that cover a full range of data on lifestyle, demography, laboratory measures, omics and clinical parameters on many human individuals. Fortunately, Europe and Canada have many datasets built up from a strong tradition in population-based prospective cohort studies (there are 668 organisations with 2467 collections in BBMRI-ERIC (EU), and 182 in Maelstrom Catalogue (CA), However, the present barrier to capitalizing on this richness is that these sensitive data are locked in local repositories, lack universally compatible data standards and are difficult to share because of national/local privacy protection and data security requirements. Therefore the overall objective of EUCAN-Connect is to enable large-scale integrated multi-cohort data analysis by enhancing long-term collaboration between European and Canadian cohorts and research networks, maturing and standardising federated cohort metadata sharing, data deposition and access, data curation/harmonization, and exchange procedures and to facilitate pooled analysis of high-value cohort data on environmental factors and omics measures that affect health over the human life course with the aim to facilitate personalized prevention and treatment of disease.
We synergized efforts of existing European and Canadian cohort networks ReACH, LifeCycle, InterConnect, RECAP, BioSHaRE, and BBMRI-LPC, integrating powerful existing approaches and solutions from Maelstrom, BBMRI, DataSHIELD, Obiba, MOLGENIS (etc), and engaging many research infrastructures, including EOSC, EGA, ELIXIR, CORBEL, IHEC, IHMC, GA4GH, P3G, and EMIF. Two new cohort exposome networks have joined the community, i.e. ATHLETE and LongITools. Core concept of EUCAN-connect is to make cohort data available into federated analysis networks. Crucially, EUCAN-Connect has been implementing the FAIR principles:

Make FINDABLE - enable cohort data discovery down to data item and subject levels. We bridged existing metadata catalogue efforts from BBMRI, Maelstrom, and We have collaboratively defined reference data model, implemented in Maelstrom/Mica and MOLGENIS catalogue software. We developed the Federated Catalogue (D2.2 with data from Maelstrom, RECAP and shared LifeCycle/ATHLETE/LongiTools source catalogues. We identified key areas of focus for the community curation tools (T2.4).

Make ACCESSIBLE - deliver a low-maintenance open data access and process architecture. We expanded the DataSHIELD platform for federated analysis, developed a new method to ease the installation and data loading by the cohorts via a Docker based system called Coral; created a separate data API that allows diverse software to be connected, implemented this in MOLGENIS ‘Armadillo’, and expanded DataSHIELD to allow access to large ‘resources’ (e.g. omics files).We created “Profiles” to allow different DataSHIELD configurations; effected a wider roll-out of nodes running Opal, Coral and Armadillo. Armadillo has been deployed in over 30 nodes; system monitoring is now available in Coral.

Make INTEROPERABLE: accelerate data harmonisation, retrospectively mapping cohort data to standard variables to enable pooled analysis. We focussed on best practices to 1) explore the study-specific data and samples; 2) evaluate harmonization potential across studies; 3) process study-specific data under a common (i.e. harmonized) format; 4) estimate the quality of the harmonized data generated; and 5) generate the information required to achieve data analysis. We submitted a publication on the methodological framework for data harmonization; developed an R package to support the harmonization process and quality control; developed guidelines and templates to guide the documentation of harmonization initiatives; and documented six research projects from WP6 and 24 Canadian studies part of ReACH.

Make REUSABLE: developing DataSHIELD bioinformatics toolboxes and federated analysis methodologies. This includes successful release of DataSHIELD v5.0 and v6.0; implementation of continuous testing for all functions; systems for interacting with, training and supporting DataSHIELD users (website and forum); community meetings and multiple workshops. We delivered D5.1 Bioinformatics Toolbox catalogue of tools and methods for federated and/or privacy protected analysis of cohort studies and biobanks and D5.2 Training material DataSHIELD users required “Complete extension and customisation of DataSHIELD, Opal, and MOLGENIS for the specific analytic needs of EUCAN-Connect.”

Make COLLABORATIONS: promote uptake by the research community at large. We created demonstrator projects and engaged existing analysis projects in LifeCycle, RECAP, ReACH and InterConnect on 1) longitudinal life course analyses from early life onwards ; 2) (epi)genomic origins, microbiome and virome adaptations; 3) early-life exposome-related risk factors; and 4) personalized prevention strategies, related to cardio-metabolic, respiratory, musculoskeletal and developmental health and disease. Additionally, a EUCAN - Connect workshop on health outcomes was organized including attendance from LifeCycle, ATHLETE, LongiTools .

Make SUSTAINABLE: ethical and legal governance and extending capabilities beyond the reach of this project. Therefore, we started development appropriate ethical and legal governance framework; made substantive progress in evidencing expectations in EUCAN-Connect’s stakeholder community; and established a governance advisory group and an ELSI expertise forum. Substantial progress has been made as regards to the analysis of qualitative interviews, on long term sustainability of components delivered, participant observations and further ECOUTER sessions with consortium members.
Key advances of EUCAN-Connect include: a sustainable and long-lived European Canadian cohort meta-network; a one-stop-shop to search across existing cohorts and networks world-wide; Mainstreaming of data harmonization protocols; Mainstreaming federated bioinformatics analysis toolboxes; Incorporating omics, in particular epigenetics and microbiome; Harmonizing novel markers of life stressors across European and Canadian cohorts; Building a large support base across a broad range of stakeholder communities; and enabling large scale analysis of cohorts to develop/fine-map personalised risk prediction models. Expected impacts are uncommonly large analyses across 10s-100s of cohorts (11 cohorts as partner, n=298,645 and at least 175 cohorts in partner networks, n=2,494,885) to research and develop strategies for identification of groups and individuals at risk by stratification and personalized prediction models based on differences in markers for (early-)life stressors. Thus, EUCAN-Connect will provide wonderful opportunities to translate research findings into policy recommendations that can address key health and social-care challenges and improve the lives of European and Canadian citizens.
EUCAN-connect conceptual overview