European Commission logo
polski polski
CORDIS - Wyniki badań wspieranych przez UE
CORDIS

leveraging the European compute infrastructures for data-intensive research guided by FAIR principles

Periodic Reporting for period 1 - EuroScienceGateway (leveraging the European compute infrastructures for data-intensive research guided by FAIR principles)

Okres sprawozdawczy: 2022-09-01 do 2023-08-31

In the last decade, many scientific domains have been transformed into data-driven disciplines. Respective approaches are addressing key societal challenges, driving the bio-economy and inspiring fundamental research. Research projects increasingly and crucially depend on the exchange and processing of data and have developed platforms tailored to their needs. Europe has invested significantly in computational infrastructure suitable for big data-driven computational research. Many national and/or regional initiatives supply HPC, HTC and distributed compute capabilities, including the EuroHPC2 initiative and the EGI federation. These contribute to the vision of the European Open Science Cloud (EOSC) as a research environment fostering data re-use and method sharing across scientific disciplines and national borders. Alongside data processing, FAIR sharing of data is now heavily emphasised and supported by funders, publishers and infrastructures, effectively being recognized as a crucial component of Open Science, for reproducibility and innovative re-use of research outputs. Beyond data, FAIR sharing of all digital objects - including software tools, data analysis pipelines, predictive models, AI algorithms and multi-stage analytics - is recognised as critical for reproducible, validated and transparent research, and for knowledge exchange, capacity building and productivity in the research endeavour.

Our overall aim of the EuroScienceGateway is to achieve our shared vision of an open collaborative digital space for European scientists. The project’s objectives are thus:

1. Accessible e-Infrastructure resources for European scientists to enable pioneering data-driven research across scientific domains.
A workflow-based gateway to computing and storage infrastructures and services for European scientists will be deployed, based on the open Galaxy platform, the Pulsar Network (a wide job execution system for distributed heterogeneous resources) and FAIR workflow services (WorkflowHub, RO-Crate and metadata standards). All components will be leveraged to a robust service (TRL-9), showcased by projects from early adopter communities.

2. Supporting the varieties of analysis types and diverse usage patterns through efficient and smart job distribution to appropriate and sustainable infrastructures.
Implementation of Bring Your Own Compute and Bring Your Own Storage will enable EuroScienceGateway to support a broad variety of analysis types, respecting different hardware configuration requirements for efficient execution. By introducing the notion of data locality into the scheduling of jobs, account costs, energy consumption and performance will be taken into account.

3. The application of FAIR principles to workflows and adoption of FAIR Digital Objects to stimulate reusable and reproducible research and enable the EOSC Interoperability Framework.
We will establish FAIR practices in computational research by integrating Galaxy workflows with FAIR Digital Objects (FDOs), capturing workflow run provenance, and making EuroScienceGateway workflow definitions available as FAIR Computational Workflows through RO-Crate, WorkflowHub and metadata standards.

4. Adoption of the EuroScienceGateway by researchers in diverse scientific disciplines.
Through our use cases and communities (WP5), we will demonstrate how EuroScienceGateway can accelerate science in these fields as well as the diverse possibilities of the approach developed in this project in general. Not least, we will provide training to deploy and use the system to answer research questions. With the outreach and training activities and the operational TRL-9 services we expect and plan to grow the services to support 100,000 users.
Computational resources, Job distribution & Deployment
Updated/developed/simplified Ansible roles, terraform recipes and documentation for Pulsar deployment; updated endpoints and images; developed TESP (GA4GH’s Task Execution Service API for Pulsar) as separate microservice; investigations on ARC and initial integration as job runner in addition to Pulsar
Six Galaxy instances in place, a seventh national one in deployment (IT); usegalaxy.eu upgraded to Galaxy 23.1; TPV meta-scheduler in place
Bring Your Own Compute/Storage (BYOC/BYOS), addressing Pulsar and ARC as well as S3 buckets and Onedata stores, respectively; web interface-controlled deployment of HTCondor virtual cluster + Pulsar on EGI resources; work on data locality/geolocation for European smart job scheduling system, inspection of DIRAC

FAIRification & Interoperability
Workflow Run paper to be submitted; Invenio (Zenodo) integration in Galaxy; FDO (RO-Crate) integration in Galaxy, GTN module on RO-Crate and WorkflowHub; pURLs for GTN

Connections & Training
established connections within EOSC (OpenAIRE, FAIR-EASE, AquaINFRA, EOSC4Cancer, Skills4EOSC); Admin workshop in Ghent, global online training Smorgasbord 2023

Scientific communities & Use cases
Well working, active communities in the process of onboarding; first results in astronomy (FITS file format and the AladinLite viewer), biodiversity (integration of genome annotation tools, being used in the Earth BioGenome Project (EBP), in particular the Vertebrate Genomes Project (VGP)), climate (workflows IceNet and FArLiG integrated, addressing forecasts of arctic sea ice and arctic lichen browning, respectively) and material sciences (PyMuonSuite and MuSpinSim integrated, X-ray Absorption Spectroscopy (XAS) data analysis tools for catalysis experiments); very positive reception of efforts to address Galaxy as a platform, and rising interest of the scientific communities.
Demonstrations:
* RO-Crate in Galaxy (im/export of workflows, history download)
* Simple deployment of computing endpoints

Standardisation framework:
* Central role of RO-Crate in EOSC Interoperability Framework
* RO-Crate with large impact in FDO community

Access to ‘markets’/internationalisation:
* Communities subject to onboarding pave the road for their scientific fields to receive access to both Galaxy as a platform and the computing resources connected with ESG
* Access to field-specific data resources (e.g. Copernicus for climate research, the International Virtual Observatory Archive (IVOA) for astronomy)
Pulsar network across Europe
Descriptive statistics of the Galaxy Europe instance (usegalaxy.eu) as of August 2023
EuroScienceGateway project structure
EuroScienceGateway project logo
Descriptive statistics of the Galaxy Training Network (GTN)