Community Research and Development Information Service - CORDIS

H2020

INDIGO-DataCloud Report Summary

Project ID: 653549
Funded under: H2020-EU.1.4.1.3.

Periodic Reporting for period 1 - INDIGO-DataCloud (INtegrating Distributed data Infrastructures for Global ExplOitation)

Reporting period: 2015-04-01 to 2016-06-30

Summary of the context and overall objectives of the project

The INDIGO-DataCloud project develops an open source software platform providing data and computing solutions for scientific communities, resource centers and public or private Cloud providers.

There are currently several technological issues that prevent easy and efficient exploitation of Cloud resources in many scientific domains. These include topics such as:
• missing consistent authentication and authorization policies across both applications and infrastructures;
• the difficulty of actually finding and using data and computing resources necessary for a given problem;
• the problem of negotiating and guaranteeing clear Quality of Service policies;
• the issue of expressing in a simple way high-level requirements that go well beyond the simple concept of a “Virtual Machine”;
• the trouble in expressing complex scientific workflows in Cloud infrastructures;
• the partial solutions currently available to integrate legacy applications into Cloud-based scientific portals, mobile appliances, or to write complex web-based front-ends exploiting advanced Cloud features;
• the problematic and often un-scalable deployment of applications necessitating Cloud resources;
• the difficulty in interfacing with both public and private Cloud infrastructures, avoiding proprietary lock-ins and licensing issues.

Providing solutions to problems such as the ones listed above is an essential task if we want to successfully build and run, as foreseen by the European Commission, a European Open Science Cloud (EOSC) and a European Data Infrastructure.

INDIGO-DataCloud tackles these problems in two complementary ways: firstly, writing building blocks, or tools, that respond to the requirements of the many scientific communities that are part of the project Consortium; secondly, applying these tools to concrete scientific use cases and applications, and deploying them to both public and private e-infrastructures.
The Cloud architecture defined and implemented by the project can in fact be applied in practice not only by Consortium members, but by many other public or private projects or initiatives. This will result in faster results and in better and easier use of data and compute resources across Europe and elsewhere. These high-level objectives are summarized by the INDIGO motto: “Better Software for Better Science”.

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

The INDIGO-DataCloud project accomplished in its first 15 months a massive amount of technical work, summarily described here. More detailed information is available in the 25 Deliverables published by the project in this period, comprising more than 2,000 pages of technical descriptions. All public deliverables, communication kits, testimonials, videos and many more information are available from the project website, https://www.indigo-datacloud.eu.

It is important to note from the start that this work was driven by the INDIGO research communities and is constantly reviewed by them, as they continue to integrate INDIGO components into their applications.

• In the area of Project Management ,the project prepared the Grant and Consortium Agreements, handled pre-financing matters, defined the project’s internal reporting procedures, collected internal cost claims, defined and set up internal and external communication tools, organized the governance structure (including setting its Technical Board and External Advisory Board), defined and ran quality assurance procedures, oversaw communication activities, organized or supported several project meetings and coordinated the definition of the overall INDIGO-DataCloud Technical Architecture, which involved all project stakeholders.
• For the support to Research Communities, requirements brought forward by the Scientific Communities participating to the INDIGO-DataCloud project were collected. These requirements were then organized in common categories and priority ranks, and associated to 11 concrete case studies, which were subsequently analysed in detail. Many dissemination activities toward internal and external communities were also organized. These included for example presentations at relevant fora and conferences, such as EGI, the European Geosciences Union, RDA and several others.
• In the area of Software Management and Pilot Services, the project defined software quality assurance processes, software maintenance processes, software problem and change management, user support, and pilot services for software development, testing and validation. This included the definition of many services, such as a continuous integration infrastructure, documentation and software repositories, helpdesk procedures and deployment strategies. Bodies such as the Engineering Management Team, the central point for the coordination of software quality assurance, release and maintenance, and the Service Providers’ Board, whose members are service providers interested in exploiting and making available to their communities the INDIGO developments, were set up. A preview testbed and a staged rollout procedure to introduce INDIGO components into various infrastructures was established, and dissemination activities were carried on, which resulted also in a software provider’s agreement with EGI, aiming at incorporating the INDIGO-DataCloud products in the EGI Cloud Middleware Distribution (CMD).
• For what regards Resource Virtualization, the project subdivided its efforts into the three areas of Computing, Storage and Network virtualization. In the Computing area, Docker support for OpenNebula was introduced, while Docker support for OpenStack was improved, and a tool for the automatic synchronization of Docker from a central repository was developed. New services to optimize the use of resources in Cloud infrastructures, such as fairshare scheduling mechanisms and support for pre-emptible instances were developed. As part of the comprehensive INDIGO Authentication and Authorization architecture, OpenStack support for OpenID-Connect was improved and a dynamic Token Translation System was developed. In order to support advanced PaaS mechanisms, additional TOSCA parsing support in the TOSCA-parser for OpenStack was provided. In the Storage area, CDMI extensions for Quality of Service support were proposed in collaboration with the Storage Network Industry Association (SNIA), and a related working group was started in the RDA context. A CDMI web service was made available, together with plugins for many storage back-ends, cross-protocol support and monitoring endpoints. In the Networking area, an OCCI compliance library was developed, the OpenStack OCCI interface was extended, with the project actively participating to the specification of the OCCI standard, version 1.2.
• In the PaaS (Platform as a Service) area, technology scouting for the definition and implementation of the PaaS platform components was first carried on. The INDIGO PaaS architecture was then defined and implemented, including many advanced and novel features. Among them, improved capabilities in the transparent geographical exploitation of Cloud resources, standard interfaces to access PaaS services with support to both de jure and de facto common standards, support for data requirements in Cloud allocations, support for the integrated use of resource coming from hybrid Cloud infrastructures, distributed data federation support including data caching, deployment, monitoring and automated scalability of applications, support for the instantiation of dynamic and elastic clusters of resources, orchestration services based on TOSCA templates, and a comprehensive identity and access management service, which is part of the larger INDIGO authentication and authorization architecture.
• In areas involving Science Gateways, Workflows and Toolkits, the project performed first technology scouting for high-level user-oriented services, and then moved on to develop components such as the FutureGateway (a programmable scientific portal providing easy access to both the advanced PaaS features provided by the project and to existing applications, plug-ins for scientific workflow systems such as Kepler, an eScience framework for data mining and analytics exploiting parallel computing techniques and smart data distribution (Ophidia), a Token Translation client, and an open mobile toolkit giving mobile programmers easy start paths to implement mobile applications based on the INDIGO FutureGateway.

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

As per the original Description of Work of the INDIGO-DataCloud proposal, the project is achieving significant advancements compared to the state of the art.

In particular, INDIGO is developing a comprehensive open source Cloud architecture, which provides many new functionalities previously unavailable in open source and in several cases also in proprietary Cloud offerings. These functionalities abstract from underlying IaaS technologies through the consistent use of both de jure and de facto standards. This allows interoperability with hybrid (public/private) infrastructures, or with e-infrastructures of different type (Grid, Cloud, HPC). The project also supports multiple existing authentication technologies (such as OAuth or SAML OpenID-Connect), addresses the need for unified data access, and provides a flexible and scalable way to authorize or deny access to distributed Cloud resources. The INDIGO platform hides the complexity and differences of physical storage systems and works seamlessly in geographically distributed infrastructures, with an optimized access to data through a template-based orchestration system and ways to automate deployment, scalability and monitoring of complex services, be they long-running or workload-based services. The project then introduced support for new services at the infrastructure level, for example extending Container support for popular open source Cloud frameworks, providing advanced resource scheduling mechanisms, and introducing QoS and data lifecycle support in storage systems. At the user interface level, the project is developing a completely programmable web framework, capable of interfacing with existing applications, mobile developments, complex workflows, big data analytics, and above all capable of supporting all the advance data and compute capabilities of the INDIGO platform.

These advancements are made concrete in the INDIGO Service Catalogue (to be published following the first INDIGO software release in August 2016) and in scientific applications making use of INDIGO components. The INDIGO services are offered as a step toward the definition of a European Open Science Cloud (EOSC) and a European Data Infrastructure (EDI). The expected impact of the project is toward easy and efficient usage of both public and private compute & data resources, in the development of cost-efficient, state-of-the-art scientific services and applications that are interoperable across diverse infrastructures, and ultimately toward producing results in many scientific domains in a faster, more effective way.

Related information

Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top