Periodic Reporting for period 1 - EuroScienceGateway (leveraging the European compute infrastructures for data-intensive research guided by FAIR principles)
Okres sprawozdawczy: 2022-09-01 do 2023-08-31
Our overall aim of the EuroScienceGateway is to achieve our shared vision of an open collaborative digital space for European scientists. The project’s objectives are thus:
1. Accessible e-Infrastructure resources for European scientists to enable pioneering data-driven research across scientific domains.
A workflow-based gateway to computing and storage infrastructures and services for European scientists will be deployed, based on the open Galaxy platform, the Pulsar Network (a wide job execution system for distributed heterogeneous resources) and FAIR workflow services (WorkflowHub, RO-Crate and metadata standards). All components will be leveraged to a robust service (TRL-9), showcased by projects from early adopter communities.
2. Supporting the varieties of analysis types and diverse usage patterns through efficient and smart job distribution to appropriate and sustainable infrastructures.
Implementation of Bring Your Own Compute and Bring Your Own Storage will enable EuroScienceGateway to support a broad variety of analysis types, respecting different hardware configuration requirements for efficient execution. By introducing the notion of data locality into the scheduling of jobs, account costs, energy consumption and performance will be taken into account.
3. The application of FAIR principles to workflows and adoption of FAIR Digital Objects to stimulate reusable and reproducible research and enable the EOSC Interoperability Framework.
We will establish FAIR practices in computational research by integrating Galaxy workflows with FAIR Digital Objects (FDOs), capturing workflow run provenance, and making EuroScienceGateway workflow definitions available as FAIR Computational Workflows through RO-Crate, WorkflowHub and metadata standards.
4. Adoption of the EuroScienceGateway by researchers in diverse scientific disciplines.
Through our use cases and communities (WP5), we will demonstrate how EuroScienceGateway can accelerate science in these fields as well as the diverse possibilities of the approach developed in this project in general. Not least, we will provide training to deploy and use the system to answer research questions. With the outreach and training activities and the operational TRL-9 services we expect and plan to grow the services to support 100,000 users.
Updated/developed/simplified Ansible roles, terraform recipes and documentation for Pulsar deployment; updated endpoints and images; developed TESP (GA4GH’s Task Execution Service API for Pulsar) as separate microservice; investigations on ARC and initial integration as job runner in addition to Pulsar
Six Galaxy instances in place, a seventh national one in deployment (IT); usegalaxy.eu upgraded to Galaxy 23.1; TPV meta-scheduler in place
Bring Your Own Compute/Storage (BYOC/BYOS), addressing Pulsar and ARC as well as S3 buckets and Onedata stores, respectively; web interface-controlled deployment of HTCondor virtual cluster + Pulsar on EGI resources; work on data locality/geolocation for European smart job scheduling system, inspection of DIRAC
FAIRification & Interoperability
Workflow Run paper to be submitted; Invenio (Zenodo) integration in Galaxy; FDO (RO-Crate) integration in Galaxy, GTN module on RO-Crate and WorkflowHub; pURLs for GTN
Connections & Training
established connections within EOSC (OpenAIRE, FAIR-EASE, AquaINFRA, EOSC4Cancer, Skills4EOSC); Admin workshop in Ghent, global online training Smorgasbord 2023
Scientific communities & Use cases
Well working, active communities in the process of onboarding; first results in astronomy (FITS file format and the AladinLite viewer), biodiversity (integration of genome annotation tools, being used in the Earth BioGenome Project (EBP), in particular the Vertebrate Genomes Project (VGP)), climate (workflows IceNet and FArLiG integrated, addressing forecasts of arctic sea ice and arctic lichen browning, respectively) and material sciences (PyMuonSuite and MuSpinSim integrated, X-ray Absorption Spectroscopy (XAS) data analysis tools for catalysis experiments); very positive reception of efforts to address Galaxy as a platform, and rising interest of the scientific communities.
* RO-Crate in Galaxy (im/export of workflows, history download)
* Simple deployment of computing endpoints
Standardisation framework:
* Central role of RO-Crate in EOSC Interoperability Framework
* RO-Crate with large impact in FDO community
Access to ‘markets’/internationalisation:
* Communities subject to onboarding pave the road for their scientific fields to receive access to both Galaxy as a platform and the computing resources connected with ESG
* Access to field-specific data resources (e.g. Copernicus for climate research, the International Virtual Observatory Archive (IVOA) for astronomy)