Skip to main content

European Science Cluster of Astronomy & Particle physics ESFRI research infrastructures

Periodic Reporting for period 1 - ESCAPE (European Science Cluster of Astronomy & Particle physics ESFRI research infrastructures)

Reporting period: 2019-02-01 to 2020-07-31

The European Science Cluster of Astronomy & Particle physics research infrastructures (ESCAPE) brings together seven ESFRI facilities (CTA, ELT, EST, FAIR, HL-LHC, KM3NeT, SKA), two pan-European organizations (CERN, ESO), an ERIC (JIV-ERIC) and a French-Italian private Consortium (EGO-Virgo). It has the unique ambition of developing a multi-probe and cross-domain Open Science Cloud environment engaging researchers from astronomy, astroparticle physics and particle physics.

The EOSC will be Europe’s virtual environment for all researchers to access, manage, exploit and re-use research outputs, open science resources and services for research, innovation and educational purposes.ESCAPE will deliver a set of services that will support Exabyte-scale FAIR data stewardship and open science, as well as being used by the engaged research collaborations in the international and global context of their science.
In the first 18 months of the project, a federated infrastructure has been deployed - a “data lake” that provides a distributed federation of storage services to ensure Exabyte-scale research data can be stored and managed reliably and according to FAIR principles. This data lake supports policy-driven data replication and availability, in collaboration with the underlying infrastructures of European networking, high performance and high-throughput computing provisioned by GEANT, PRACE and the research infrastructures themselves. The data lake is an evolution of the grid concept, and continues to integrate services through federation based on a state of the art Authentication and Authorization (AAI) mechanism utilizing secure access tokens, benefitting from developments from foregoing H2020 projects, and aligned with best practices of large cloud providers and security organisations of RIs and science communities. Currently the initial prototype data lake uses 10 research organisations across Europe providing storage systems federated together appearing as a single service to users. Several use cases from the participating ESFRIs are using this to validate their workflows. A high-level service provides a data catalogue, policy implementation for replication, availability, and lifecycle of data, and allows selection of Quality of Service of the storage systems for various data types.

This data lake is complemented by an activity to prototype a repository of services and tools that can be seen as a contribution to an eventual thematic repository in the EOSC portal. The repository infrastructure is based on existing well-known services such as GitLab and Zenodo with an automatic harvesting of the science products from OpenAire. The key software and services that are used to build the data lake, the workflows and pipelines of the science communities, and other developments in ESCAPE will all be published through the catalogue and made available as open source to the EOSC community. First packages are available within this repository, as test cases for cross-fertilization, re-use and interoperability. They are also made available in containerized versions to enable simple deployment in open science environments.

The Virtual Observatory (VO) framework for multi-messenger astronomy has been mapped into the EOSC services in an initial way. This results in a VO registry of resources as a first example to give feedback to the EOSC implementation. Within the VO framework an activity on common standards for high-level data products and archives has been pursued, with emphasis on new communities. This serves to ensure that astronomy data will be interoperable and conform to FAIR principles. Work on machine learning applied to the ESO archive, demonstrates a capability for “search by similarity” as a first example of value-add to the astronomy archives achievable through the FAIR implementation in ESCAPE.

At a higher level, prototype science analysis platforms have been developed. These are intended to demonstrate the end-to-end capability for a project scientist to locate relevant data within the data lake archives, access analysis workflows in the repository and deploy them on integrated compute resources as appropriate upon those data sets.

These achievements are complemented by an active dissemination and public engagement activity. In particular a number of citizen science activities have been implemented in the ESCAPE context to showcase the science data available within the ESCAPE FAIR archives, with a number of Masterclasses and mass participation events.
Today the data lake, although still modest in scale, presents a unique distributed storage infrastructure for scientific data, presented to the user as a single system, but providing resilience through its distributed nature. It has demonstrated policy-driven data placement and replication, open and controlled access to both open and embargoed data, and provides an initial level of quality of service upon request and by policy. These are unique features for such a scientific infrastructure. This builds on tools and services from the ESFRIs, and from previous FP7 and H2020 projects. It provides full support for data lifecycle management of the scientific data. By the end of the project it will operate at full scale and be capable of managing Exabyte-scale data, and be used in production by the majority of the ESFRIs in ESCAPE. The impact of this is to provide a real FAIR data management service, optimising the costs between the e-infrastructures (GEANT, EGI, EUDAT, etc.), and the scientific collaborations providing the resources for storage and compute. It is intended to minimize operational and staffing costs.

Several innovative workflows using Machine Learning (ML) techniques are being prototyped to be able to automatically add value to the science products in the repository. The repository will be scaled up, including collaboration with other ESFRIs, to have a major impact by making public and accessible a large amount of domain knowledge and tools as well as data products.

In the next phase of the project two large “Test Science Projects” (TSP) will be implemented to demonstrate new cutting edge scientific capabilities, making use of the services implemented within ESCAPE. The first TSP addresses a fundamental scientific question on Dark Matter, by integrating all of the data from the astronomy and particle physics ESFRIs and providing a unique opportunity to explore that data coherently. The second TSP on the Extreme Universe and Gravitational Waves will push the frontier in Multi-Messenger Astronomy, by fully exploiting the data available in ESCAPE and the capabilities of the science and e-infrastructure platforms. These projects, together with other scientific use cases in ESCAPE will give strong feedback on the capabilities delivered by ESCAPE, and will serve to both ensure a clean integration of facilities across the project as well as demonstrate open science capabilities.