Periodic Reporting for period 1 - EUCAN-Connect (A federated FAIR platform enabling large-scale analysis of high-value cohort data connecting Europe and Canada in personalized health)
Reporting period: 2019-01-01 to 2020-06-30
However, the present barrier to capitalizing on this richness is that these sensitive data are locked in local repositories, lack universally compatible data standards and are difficult to share because of national/local privacy protection and data security requirements. Yet integrated analysis is essential to reach the statistical power needed to elucidate the complex relationships between genetic traits, environment and diseases and to benefit from the content, temporal and geographic diversity in these cohorts.
Therefore the overall objective of EUCAN-Connect is to enable large-scale integrated multi-cohort data analysis for personalized prevention and healthcare by enhancing long-term collaboration between European and Canadian cohorts and research networks, maturing and standardising federated cohort metadata sharing, data deposition and access, data curation/harmonization, and exchange procedures and to facilitate pooled analysis of high-value cohort data on environmental factors and omics measures that affect health over the human life course with the aim to facilitate personalized prevention and treatment of disease.
Core concept of EUCAN-connect is to make cohort data available into federated analysis networks, i.e. enabling data access and analysis without physically sharing data to ensure no privacy protected data can be accessed. Crucially, EUCAN-Connect adheres to the FAIR principles, i.e. data should be Findable, Accessible, Interoperable and Reusable. Obviously all this is of little value if not widely used, adopted and sustained in long-term collaborations. Progress on each objective is listed below:
Make FINDABLE - enable cohort data discovery down to data item and subject levels. EUCAN-connect is bridging existing major efforts from BBMRI, Maelstrom, and birthcohorts.net as well as specific catalogues from cohort networks. We have compared all existing models and collaboratively defined a first reference data model and started the developments for the federation of catalogues.
Make ACCESSIBLE - deliver a low-maintenance open data access and process architecture. We therefore expanded on the DataSHIELD platform for federated analysis, developed a new method to ease the installation and data loading by the cohorts via a Docker based system called Coral; created a separate data API that allows other software to also be connected, implemented in MOLGENIS ‘Armadillo’, and expanded DataSHIELD to allow access to large ‘resources’ opening exciting prospects for *omics data analysis in via DataSHIELD.
Make INTEROPERABLE: accelerate data harmonisation, retrospectively mapping cohort data to standard variables to enable pooled analysis. We focussed on harmonising this process between cohort networks, i.e. how to best 1) explore the study-specific data and samples available; 2) evaluate harmonization potential across studies; 3) process study-specific data under a common (i.e. harmonized) format; 4) estimate the quality of the harmonized data generated; and 5) generate the information required to achieve data analysis and properly interpret results including supportive tools.
Make REUSABLE: developing DataSHIELD bioinformatics toolboxes and federated analysis methodologies. This includes successful release of DataSHIELD v5.0 and v6.0; implementation of continuous testing for all functions; systems for interacting with, training and supporting DataSHIELD users (website and forum); and two community meetings and multiple workshops.
Make COLLABORATIONS: promote uptake by the research community at large. We therefore created demonstrator projects and engaged existing analysis projects in LifeCycle, RECAP, ReACH and InterConnect. In particular, we focus on 1) longitudinal life course analyses from early life onwards ; 2) (epi)genomic origins, microbiome and virome adaptations; 3) early-life exposome-related risk factors; and 4) personalized prevention strategies, related to cardio-metabolic, respiratory, musculoskeletal and developmental health and disease.
Make SUSTAINABLE: ethical and legal governance and extending capabilities beyond the reach of this project. Therefore, we started development appropriate ethical and legal governance framework; made substantive progress in evidencing expectations in EUCAN-Connect’s stakeholder community; and established a governance advisory group and an ELSI expertise forum has been to help guide EUCAN-Connect and establish its long-term sustainability.