Providing European researchers with easier access to data management solutions and large storage systems close to Europe’s most powerful supercomputers, while enabling them to move large amounts of data across borders, is the key objective of the EUDAT2020 project.
To remain at the cutting edge, European researchers across a range of disciplines need to be able to preserve and access masses of data and foster cross-border collaborations. EUDAT2020 aims to facilitate exactly this.
Building on previous EU-funded projects, this initiative has brought together a network of European research organisations and data and computing centres across 14 countries to create a pan-European collaborative data infrastructure (CDI). As of April 2018, 23 partners had formally joined the CDI.
Project coordinator Damien Lecarpentier from CSC in Finland discusses the project’s achievements as well as its role in helping to ensure future European research excellence.
We’ve heard about Big Data creating new opportunities for researchers. But what have been some of the challenges?
The European Union and its Member States have invested heavily in recent years to make distributed grids and high-performance computing (HPC) facilities available for researchers across a range of fields. The challenge is that the rapid growth of data – thanks to powerful new scientific instruments, simulations and the digitisation of existing resources – requires new ways of organising and processing the amount of information now available. We need to develop a more coherent approach to data management, and this is what this project is about. We wanted to connect data centres in order to better support different research communities.
Can you give some specific examples of these challenges?
In solid earth science, the data that is gathered spans real-time and off-line data (such as pictures, videos and organised data structures stored in databases). These different types of data have different technical requirements in terms of access and preservation. In the biomedical community, a key challenge is ensuring that data can be accessed while preserving the legal requirements of patient anonymity and confidentiality. All research fields, including the social sciences and humanities, face challenges related to managing data replicas and accessing this data in a multiple user environment.
What role have researchers played in this project?
Since the beginning, research communities have been in the driving seat in terms of selecting data services. They have also directly participated through multi-disciplinary teams in the design and development of these services. The project brought together over 50 research communities across a range of disciplines, with each one bringing specific requirements and knowledge. These requirements ranged from the need to replicate data for greater availability and ensuring the safety of sensitive data to being able to share data beyond the initial community.
Newer research communities are often still designing their core data workflow processes and are interested in trialling various solutions before they can commit. More mature communities usually have an existing working infrastructure.
Wherever possible, we viewed existing services as opportunities and sought to support them by providing the communities with the possibility of scaling out their computing and storage environment using the CDI infrastructure. This meant considering research communities in their role as service providers and not only customers.
How will the project benefit researchers?
Research communities involved in the project were able to plan, implement and use data management services on a European-wide scale. Scientific fields covered include social sciences and humanities, Earth and atmospheric science, climate science, biodiversity, life sciences and physics.
In the past, if I needed to access a storage system where I could also analyse my data I could talk with my local data and computing centre. But this would just cover local users, from the same country. Moving data across borders or sharing data and tools with colleagues from abroad often required a bespoke solution every time, which is simply not scalable. This sustainable partnership, in which all partners share a common vision, has opened up access to data tools at the European level and enables European collaborations to be activated far quicker.
The project has also made providers of data storage and management services much more aware of the needs of research communities. This includes their data management requirements, as well as how they organise their particular research infrastructures; for example, whether they choose to run their own data management services or whether they use pre-existing services that require special adaptations.
What have been the key factors behind the project’s success?
These achievements were made possible by a generously funded EU project and by a group of highly engaged project partners. By building on previous project experiences and working together, we managed to create a unique culture for open knowledge exchange and collaboration. We have created the EUDAT CDI as a way of preserving and continuing this legacy.
How will this legacy be secured?
During the project’s final year, we focused on moving from a project basis to a sustainable organisation. EUDAT partners have committed to sustain the CDI and its services for an initial period of 10 years. We have also established a secretariat to coordinate the development and operation of the CDI infrastructure, and in February 2018 a limited liability company was formally established. This will operate on a non-profit making basis as the voice of European organisations working together as part of the EUDAT CDI, providing services related to scientific and research data storage and lifecycle management.
As for the future, EUDAT CDI is a growing organisation based on a contractual agreement between its members. It is one of the key pillars of the European Open Science Cloud, a cloud for research data in Europe. The CDI is an open enterprise and welcomes service providers wanting to join the network with various levels of engagement and integration.