Common Infrastructure for National Cohorts in Europe, Canada, and Africa

Periodic Reporting for period 2 - CINECA (Common Infrastructure for National Cohorts in Europe, Canada, and Africa)

Período documentado: 2020-07-01 hasta 2021-12-31

Over the last forty years, we have seen the emergence of many large cohorts of human samples from research and national healthcare initiatives. Access to large cohorts by researchers and clinicians is essential to fully realise the potential to positively impact human health. CINECA's vision is a federated cloud enabled infrastructure making population scale genomic and biomolecular data accessible across international borders, to deliver a paradigm shift of federated research and clinical applications. CINECA partners represent a unique combination of scientific excellence with experience of ten diverse cohorts and scientific projects such as the European Genome-phenome Archive (EGA), CanDIG, and H3Africa, which together constitute a virtual cohort of 1.4M individuals from population, longitudinal and disease studies.

CINECA mobilises cohorts to realise the personalised medicine vision described in the ICPerMed Action Plan, Genome Canada, and CIHR. The societal impacts expected from the CINECA project include:
-Intensified sharing, reuse, collaboration and knowledge discovery in the health field, while ensuring ethical and legal compliance on the use of the data
-More efficient research through reduced duplication of experimentation
-Improved disease definitions and diagnostics
-Patient stratification and treatments
-A cross continental framework for accessing human genetic data
-Early interception and lower disease burden
-Impact on drug discovery and development

Key objectives of the CINECA project:
1.Deliver transcontinental security requirements for data access
2.Provide solutions to ELSI requirements where data cannot move outside a legal jurisdiction
3.Provide federated access to genomic data on demand
4.Deliver access to datasets of the scale and completeness needed to address analytical challenges
5.Provide harmonised metadata, based on open global standards, driving variant and sample discovery in a trans-continental virtual cohort of 1.4 million individuals
6.Wide adoption in international personalised medicine projects
In RP2, we made significant progress towards achieving the project objectives as follows:
1. Deliver transcontinental security requirements for data access:
CINECA has provided support to ELIXIR and EGA (WP2) to implement GA4GH Passport for Authorization and Authentication Infrastructure (AAI). In addition, we worked on CINECA use cases to integrate the AAI using GA4GH Passports (WP2). We also provided training to all CINECA cohorts to accelerate their AAI implementation (WP2).
2. Provide solutions to ELSI requirements enabling the use of data which cannot move outside a legal jurisdiction:
From a Report and a Data Workflow Survey (WP4) we decided CINECA should contribute and use GA4GH standards, including the Data Use Ontology, the Task Execution Service (TES) standard, and the Authentication and Authorisation Infrastructure (AAI) and Passport. In addition, we have continued to work on the ELSI requirements and guidelines (WP7).
3. Provide federated access to genomic data on demand:
We developed workflows based on the CINECA use cases (WP4). A framework for designing a portable federated pipeline has been tested and implemented in the “Federated joint cohort genotyping” use case. In addition, a RNA-seq quantification workflow has been developed for CINECA use cases. Moreover, we have integrated the Query Expansion APIs within the first Demonstrator (WP5) for the search, access and visualisation of data, and 4 Synthetic Datasets (H3ABioNet, CHILD, CoLaus, UK Biobank) have been created in the project.
4. Deliver access to datasets of the scale and completeness needed to address analytical challenges:
For federated data discovery and querying we have chosen different standards like Beacon for Discovery Queries, Beacon v2 connected for Extended Queries and Service registry as Service Catalogue (WP1). These standards have been deployed for queries across multiple sites and several cohorts and synthetic cohort datasets.
5. Provide harmonised metadata, based on open global standards, driving variant and sample discovery in trans-continental virtual cohort of 1.4 million individuals:
In RP2 we created a metadata model formalised into the GECKO (GEnomics Cohorts Knowledge Ontology), which has been publicly released (WP3). In addition, we approved GECKO as suitable for our Use Cases (WP4 and WP5). We also developed a new standard to represent data access conditions, the Data Use Ontology (DUO), which has been published and adopted as GA4GH standard. DUO is currently being widely adopted with over 200.000 annotations worldwide (WP3).
6. Wide adoption in international personalised medicine projects:
The International Hundred-K Cohorts Consortium (IHCC), which aims to provide access to tens or hundreds of harmonised cohort data dictionaries across continents, has adopted GECKO as a metadata model for cohort data harmonisation. IHCC collaboration will contribute to the expansion of GECKO in order to harmonise more complex cohorts. In addition, a collaboration with the Beyond 1 Million Genome (B1MG) project has resulted in the creation of a new Synthetic Dataset for rare diseases, which can be used to develop new tools to access and analyse data interesting for the Rare Disease community.
In RP2, CINECA has made the following achievements beyond the current state of the art for cohort interoperability:
- CINECA identified the need for a set of high quality synthetic cohort datasets to remove barriers for the development of federated research and clinical applications. Four synthetic cohorts are now available from both the EGA and CanDIG using the same data access methods as real cohort data. This approach has been adopted by the B1MG project, which has developed a rare disease synthetic cohort available from EGA.
- CINECA technical WPs have continued to drive the development of key community standards.. Examples include the GA4GH Beacon v2 standard (ratified as an official GA4GH standard in late 2021) and the Researcher ID standard (EGA provided one of the first working implementations in RP2).
- In RP2, we collaborated with international cross cohort projects such as the International Hundred-K Cohorts Consortium (IHCC), Maelstrom resource, the European Joint Program on Rare Diseases, BBMRI-ERIC, and the Dementia Portal UK to develop the GECKO cohort metadata mapping model and expand its use beyond the CINECA project and ensure that it is interoperable with existing cohort resources. All of the cohort metadata mappings produced by CINECA are openly available beyond the end of the project.
- A key socio-economic aim of CINECA is to implement a comprehensive benefits sharing program beyond Europe and Canada, via the inclusion of the H3AfricaBioNet project via our UCT partner. In RP2, we focused on knowledge and expertise exchange with African researchers through participation in various African training activities as a lecturer or speaker. Examples are lectures given by CINECA partners during the H3Africa consortium meeting in April 2022 or the H3ABioNet NGS Bioinformatics course, held virtually from April to June 2022. In addition, local genome data Beacons (v2) have been implemented for H3Africa data, and H3ABioNet has been exploring options for implementing authentication and authorisation systems in the H3Africa/H3ABioNet ecosystem.
