Periodic Reporting for period 2 - XDC (eXtreme DataCloud)
Reporting period: 2019-02-01 to 2020-04-30
The eXtreme DataCloud (XDC) project develops scalable technologies for federating storage resources and managing data in highly distributed computing environments. The provided services are capable of operating at the unprecedented scale required by the most demanding, data intensive, research experiments in Europe and Worldwide. The targeted platforms for the released products are the already existing and the next generation e-Infrastructures deployed in Europe, such as the European Open Science Cloud (EOSC), the European Grid Infrastructure (EGI), and the Worldwide LHC Computing Grid (WLCG). XDC is run by a Consortium that brings together technology providers with a proven long-standing experience in software development and large research communities belonging to diverse disciplines: Life Science, Biodiversity, Clinical Research, Astrophysics, High Energy Physics and Photon Science. The project grounds its roots on technologies already developed by the partners or previous H2020 initiatives (such as the INDIGO-DataCloud project). XDC software is released as Open Source and is based on already existing components that the project enriches with new functionalities and plugins. The use of standards and protocols widely available on the state-of-the-art distributed computing ecosystems guarantees that the released components can be easily plugged into the European e-Infrastructures and in general on cloud based computing environments. The project is investing in the direction of enhancing the user experience in accessing the data management services, providing friendly, web-based user interfaces ready for the mobile world. The final goal is to significantly lower the access barriers to distributed computing, release more usable, more reliable, functionality-rich and still scalable data management services to cope with the most demanding scientific use cases. XDC services are designed to cope with the new nature of the distributed e-infrastructure, providing solutions to support the dynamic extension of computing centers to remote locations or the usage of sites with limited storage capacity maintaining transparent bi-directional access to the data stored in all the locations. XDC opens new possibilities to scientific research communities in Europe supporting the evolution of e-Infrastructure services for Exascale data resources. The driving force of the project developments relies on real life requirements provided by the Research Communities represented in the Consortium, however the topics and challenges addressed are of general interest for bigger and smaller user communities. The main general impact expected for the project is an increased uptake of the European e-Infrastructures.
Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far
After a careful analysis of the requirements collected from the scientific communities represented in the Consortium, XDC defined its technical architecture focusing the developments activities in the integration of highly performant components in the areas of data storage, data transfer, data federation and data orchestration. XDC put a lot of effort in the Software Quality Assurance (SQA) process for its products aiming at releasing high-quality software. The development activities led to the first public release of the project on January 25th 2019. It was codenamed XDC-1/Pulsar addressing important topics like federation of storage resources, smart caching solutions, policy driven, data management based on Quality of Service, data lifecycle management, metadata handling, optimized data management based on storage events. The XDC Service Catalogue is available on the project website. Training sessions and several workshops have been organized as co-located events in EOSC and RDA conferences. Nine events were organized during the project lifetime and, in total, the XDC partners provided more than 50 contributions to events. In March 2020 XDC produced the second major release - XDC-2 (codename Quasar) that is now powering several production services used by the internal and external. The XDC-contributed improvements have been integrated into the main repositories of the various software projects, so they will slowly start being naturally rolled-out on the various deployed instances as part of the regular updates. It's already possible to highlight the number of services that already are benefiting or will benefit from them: EOS is mainly used in production at CERN but is also deployed at other sites; dCache is used extensively in WLCG and EGI infrastructures, and as the XDC changes have been merged into the main branch they are being made available to the whole dCache community; FTS: The XDC-contributed code is being released in the standard version, FTS is being used in WLCG, EGI and PaNOSC with more than 12 instances deployed in Europe; Dynafed: all the XDC-contributed code have been released and is running in production in the Canadian Cloud Federation, in Napoli for the Belle-II experiment computing model and at CERN; Onedata: is used in various testing and production deployments, but the exact number is unknown as there isn’t any central point. There are deployments by various XDC partners and to provide the EGI DataHub service; The PaaS-related components are used extensively in coordination with the DEEP project and a test instance is accessible via the EOSC marketplace. XDC also setup piloting activities with various external communities, mostly belonging to the EOSC ecosystem, mainly done via workshops and events. It's worth to mention some of them here: EOSC-Hub Marine Competence Centre: planning and implementation of pilots for SeaDataNet activities - concrete work plan was designed and is being implemented; EOSC-Hub Fusion Competence Centre: evaluation and report on Data Replication and Access testing using Onedata; Photon and Neutron in the context of the PaNOSC project, various presentations have been made and piloting activities are being implemented; Earth Observation, ASTRON, ESCAPE communities joined a workshop on data management to present their use cases and evaluate possible solutions. Moreover, XDC components are exploited also by the EOSC-hub DODAS (Dynamic On Demand Analysis Service) Thematic Service. XDC has an impact also in the recent COVID-19 emergency supporting the activities of the ECRIN-related communities. The MDR is in production for the ECRIN task force on COVID-19, accessible from the ECRIN webpage (https://www.ecrin.org/clinical-research-metadata-repository). MDR has been presented in the discussion about the implementation of the European COVID-19 Research Data Platform and is proposed for inclusion in the Infection Diseases Data Observatory (IDDO). Moreover, MDR will be adopted as early adopter in the EOSC-hub project and it will also be included in the H2020 ESOC-Life project.
Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)
The XDC proponents identified technological gaps in the current e-Infrastructures concerning data management services that have been filled up by the new functionalities that XDC released. Providing advanced features at the infrastructure level, the XDC architecture reduces the effort needed to port complex workflows and computing models into the distributed systems. XDC opens new possibilities to scientific research communities in Europe supporting the evolution of e-Infrastructure services for Exascale data resources.