Take maps of marine biodiversity and cross-reference them with records of fish catches and you should get a clear picture of where fish stocks are most at risk. Doing so could help save the world's oceans, but needs a huge amount of complex data to be processed and analysed. EU-funded researchers are solving the problem with an innovative, inspired-by-nature approach to e-infrastructures and looking at ways open data initiatives can be integrated.
E-infrastructures use grid and cloud computing to harness the storage, processing and software functionality of a multitude of distributed resources. An e-infrastructure could be set up by a group of biology researchers, for example, to study a specific problem. Using an e-infrastructure, the biologists might create a Virtual Research Environment (VRE) for collaboration while harnessing grid computing resources to process information from one source and analyse it with data-mining software tools from another. But what if, during the course of their work, they want to cross-reference their data with information from other researchers using different data, software and computing systems, or even public open data resources?
'Integrating resources across different e-infrastructures is very difficult and time-consuming, and in many cases requires a new e-infrastructure to be built, which is neither time nor cost-effective,' explains Donatella Castelli, a researcher at the Institute of information science and technology "Alessandro Faedo" of Italy's National Research Council.
If those different e-infrastructures exist in an ecosystem where, as in nature, they are aware of each other and are able to cooperate or even compete, sharing resources among them becomes dramatically simpler, easier and cheaper. It was this vision that led a consortium of universities, research institutes, companies and a UN body to launch the 'Data infrastructures ecosystem for science' ( D4Science-II) project. Supported by EUR 4.3 million in funding from the European Commission, the project created an interoperable framework for e-infrastructures - an e-infrastructure ecosystem in which data, computing and software resources belonging to different e-infrastructures can be shared regardless of location, technology, format, language, protocol or workflow.
Interoperability between e-infrastructures in the D4Science-II Knowledge Ecosystem is provided in two ways: through the use of common standards among e-infrastructures and, most importantly, through so-called 'mediation frameworks.' The mediation frameworks consist of software that translates and transforms heterogeneous data and processes in such a way that they can be used in different contexts by different e-infrastructures, making cooperation possible. The backbone of the system is gCube, a scalable software framework that enables interoperability and which underwent testing by Hungarian project partner 4D SOFT.
The D4Science e-infrastructure not only aggregates resources and makes them interoperable but also offers them back to other e-infrastructures, allowing them to dynamically access data, software tools and computing power.
'In this sense, the e-infrastructures in the ecosystem can be competitive. Researchers can choose from among the resources available those that best suit their needs at any given time,' Dr. Castelli notes.
The strength of such an approach is visible in the VREs and in the gCube applications (open access VREs) set up as part of the D4Science-II project and available on the D4Science portal.
'D4Science-II has its origins in two earlier projects, DILIGENT and D4Science, which started developing infrastructures for digital libraries built on grid-enabled e-infrastructure. However, we saw that a lot of e-infrastructures already exist for specific purposes and realised that it is better to use the resources they have and make them work together, rather than building a new e-infrastructure each time. Our focus in D4Science-II therefore changed from enabling e-infrastructures to building an e-infrastructure ecosystem,' Dr. Castelli explains.
From biodiversity and fishing to high energy physics...
The ecosystem has been used for supporting VREs in fields such as high-energy physics, biodiversity, fisheries and aquaculture resources. It has helped open up new areas of research between them and is now being extended to new domains.
AquaMaps, a project to create global distribution maps of the world's marine species, utilises grid and data e-infrastructure resources through a VRE set up on the D4Science infrastructure.
Generating high-resolution maps showing the distribution of fish species is a computationally intense task: drawing a single multi-species map requires 125 million computations. Without a grid-enabled e-infrastructure, generating the collection of maps required to support a research activity might take days; with grid computing it takes just hours. Within the D4Science ecosystem, three separate but related VREs working with fisheries data have been able to use information and resources provided by different data e-infrastructures (GENESI-DEC for Earth observation data, GBIF for biodiversity data, and FIGIS for fisheries information). With this capability, they have been able to carry out innovative statistical analysis processes that were simply impossible before, combining information about fish species and location of catches with environmental and geospatial data, for example.
'We collect statistics on all sorts of fisheries from all sorts of countries and of a wide diversity of data qualities. D4Science helps us bring all this data together,' notes Anton Ellenbroek of the FAO's Fisheries and Aquaculture Department in Rome. 'It's a really important infrastructure... it allows us to analyse statistics in ways that were not possible before and we can easily share with other virtual research environments.'
The FAO also hosted a workshop with the project on ‘Digital Repositories - Linked Open Data’ to examine solutions to publishing digital repositories as linked open data using advanced tools such as the D4Science VREs.
The success of VREs dealing with fisheries and biodiversity data in D4Science-II has inspired two follow up projects in the field.
In i-Marine, researchers are applying the ecosystem approach to fisheries management and conservation of the marine environment, using an open platform based on the D4Science infrastructure to work with a set of knowledge and data sources much broader than that used in conventional fisheries management.
And in the 'EU-Brazil open data and cloud computing e-Infrastructure for biodiversity' ( EUBrazilOpenBio) project, European and Brazilian researchers are using the e-infrastructure ecosystem approach to set up an open access platform integrating existing European and Brazilian e-infrastructures and resources for biodiversity science.
'Cooperation across e-infrastructures opens up entirely new possibilities and areas of research. We can analyse scientific data against economic statistics, for example, to get an entirely new perspective that was not available before,' Dr. Castelli says.
- 'Data infrastructures ecosystem for science' project website
- D4Science-II factsheet on CORDIS
- 'EU-Brazil open data and cloud computing e-Infrastructure for biodiversity' project website
- EUBrazilOpenBio factsheet on CORDIS
- Speech on Open Data from European Commissioner Neelie Kroes
- D4Science-ll drives forward the science e-resource revolution
- Work on pan-European grid infrastructure moves to next level
- Grid computing tackles Alzheimer's
- EU's EELA fuels Latin American computing grid, EELA-2 makes use widespread
Information Source: Donatella Castelli, Institute of Science and Information Technology, CNR, Italy