Community Research and Development Information Service - CORDIS

H2020

DICE Report Summary

Project ID: 644869
Funded under: H2020-EU.2.1.1.3.

Periodic Reporting for period 1 - DICE (Developing Data-Intensive Cloud Applications with Iterative Quality Enhancements)

Reporting period: 2015-02-01 to 2016-01-31

Summary of the context and overall objectives of the project

Recent years have seen the rapid growth of interest for cloud applications built on top of Big Data technologies such as MapReduce/Hadoop, NoSQL databases, cloud-based storage, and stream processing systems. However, there is still a visible shortage of software engineering models, methods and tools for developing data-intensive software systems. Moreover, the rush to hit the Big Data market with new products leads companies to reduce the attention they pay to quality aspects, which are yet critical to avoid project failures. This issue is exacerbated by the fact that quality engineering of data-intensive software systems is still in its infancy, making it difficult today to analyse, predict and guarantee efficiency, reliability and safety properties for these applications.
DICE aims at addressing these challenges by defining a novel framework for quality-driven development of data-intensive applications. The DICE methodology will cover quality assessment, architecture enhancement, testing and agile delivery, relying on principles of the emerging DevOps paradigm. At a high-level, the DICE paradigm aims at:
- Tackling skill shortages and learning curves in quality-driven development and Big Data technologies through open source development tools, models, and methods.
- Shortening the time to market for data-intensive applications that meet quality requirements, reducing costs for independent software vendors (ISVs) and increasing value for end-users.
- Reducing the number and the severity of quality incidents by iteratively learning the quality-levels of the application at runtime, feeding this information back to developers.

To achieve these goals, DICE proposes a model-driven engineering (MDE) framework for Big Data applications based on UML, and contextualizes such framework into a DevOps process. The DICE framework features a quality engineering tool chain for simulation and verification of efficiency (cost and performance), reliability, and safety properties of the application. Using a DevOps-inspired approach, the DICE framework allows rapid integration and delivery of novel prototypes. The DICE approach uses monitoring and feedback analysis techniques to deeply analyze monitoring data collected during testing and prototyping in order to identify quality incidents and accelerate the iterative enhancement of the application across release cycles.

From a methodological standpoint, DICE focuses on delivering value to the application developer. The Eclipse-based DICE IDE will guide the developer through the DICE methodology. This IDE will initially offer the ability to specify the data-intensive application through a standard modelling language (UML) and the DICE extension of this language, called the DICE profile. For example, using the DICE profile UML models can be annotated with configuration requirements for technologies such as Apache Storm, Apache Spark and Hadoop/MapReduce. This is made possible by the DICE profile.

Designers will exploit the modelling capabilities of DICE to not only design the structure and behaviour of their applications, but also to inspect their quality characteristics during early-design stages and throughout monitoring feedback and iterative enhancement cycles. From these models, the tool-chain will guide the developer through the different phases of quality analysis (e.g., simulation and/or formal verification), deployment, testing and iterative enhancement (e.g., through monitoring data annotations in the DICE models).

Models used in DICE for MDD contemplate different abstraction layers. Stemming from the model-driven architecture principles, DICE considers three modelling abstraction layers, called DPIM, DTSM, and DDSM which have the following characteristics:
- DICE Platform Independent Model (DPIM): This model corresponds to the OMG MDA PIM layer and describes the behaviour of the application as a directed acyclic graph that expresses the dependencies between computations and da

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

The technical work done in year 1 has been concerned with the following main technical aspects:
- Analysis of the state of the art in software engineering and technologies for data-intensive applications released as part of deliverable D1.1 - State of the art analysis (available alongside the other public deliverables on the DICE website, http://www.dice-h2020.eu/deliverables/).
- Specification from the experience gathered in D1.1 of a set of technical requirements for the DICE WPs (D1.2). This work also involved an initial focusing on a subset of technologies to prioritize (e.g., Hadoop/MapReduce, Storm, Spark, Cassandra) and the definition of initial requirements for the demonstrators, which are due for an update later at M16. Such requirement repository has been collected in a repository that has been made available in a companion document included in D1.2 - Requirement specification.
- The identification of a high-level architecture for the DICE framework together with integration patterns and identification of the concrete tools to be implemented across the project and their expected technical and research innovations (D1.3 - Architecture definition and integration plan - Initial version).
- Initial version of the DICE Models and DICE Profile that will be used by developers to create data-intensive applications through the DICE IDE (D2.1 - Design and quality abstractions - Initial version). An initial methodology has also been drafted, with the intention of continuously refining its definition until the planned release in year 3.
- Initial version of the tools for quality assessment in the DICE IDE, namely the DICE Simulation Tool to predict quality characteristics of software systems in the early design stages of a data-intensive application and initial interfacing to DICE Models and external simulation software (D3.2 - DICE simulation tools - Initial version), and the DICE Verification Tool to verify formal properties relatively to safety and reachability and its initial interfacing of the tool to the DICE Models (D3.5 - DICE verification tools - Initial version).
- Initial version of the DICE runtime environment, composed of the Monitoring platform and APIs (D4.1 - Monitoring and data warehousing tools - Initial version), which integrates novel tools for monitoring of Big Data technologies, and the DICE Delivery tools, including Continuous Integration, Deployment and Configuration Optimization capabilities (D5.1 - DICE delivery tools - Initial version).

Moreover the following non-technical activities have been performed in year 1 (cf. deliverables D7.5, D8.4, D8.7):
- A sustained dissemination effort comprising 4 journal papers, 12 proceeding papers, 8 oral presentations, among others. Communications via website (www.dice-h2020.eu), social media channels on Facebook, Google+ and Twitter, and participation to more than 30 events.
- Collaboration with the SE4SA software engineering cluster and EU projects such as MODAClouds and CACTOS and initial planning for collaboration with MIKELANGELO. Collaboration with the SPEC RG DevOps working group, comprising joint organization of the QUDOS workshop (qudos2015.fortiss.org) at ESEC/FSE 2015, DICE contributions to the official SPEC RG white paper on DevOps, and joint preparation of a questionnaire for external industry stakeholders to better under the nature of the DevOps market and its relation to quality engineering.
- Standardization activities as part of the TOSCA working group. Export of monitoring data into the OSLC standard to allow external DevOps tools to reuse our D-Mon monitoring platform.
- Definition of a research data management focussed on the European portal Zenodo (https://zenodo.org). Definition of the internal collaboration platform. Implementation of quality management and ethics issues management practices by the coordinator.

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

DICE will deliver innovative development methods and tools to strengthen the competitiveness of small and medium European ISVs in the market of business-critical data-intensive applications. The barriers that DICE intends to break break are the shortage of methods to express data-aware quality requirements in model-driven development and the ability to consistently consider these requirements throughout the tool-chain of quality analysis, testing, and deployment. Existing methodologies and tools provide these capabilities for traditional enterprise software systems and cloud-based applications, but when it comes to increasingly popular technologies such as, e.g., Hadoop/MapReduce, Spark, Storm, Cassandra, it is not possible to adopt a quality-driven software engineering approach yet. DICE intends to deliver this capability, providing a quality-driven development environment for data-intensive applications.
In the design of a data-intensive applications, existing software engineering approaches face a number of limitations, even if one considers the basic specification of requirements. For example, it is possible with MDE to express entity-relationship models, basic dependencies between components and data, field types and values, and data semantics. However, new approaches are required to explicitly annotate in software models information such as:
- static characteristics of data: e.g., volumes, value, storage location, data access control;
- dynamic characteristics of data, e.g., read rates, write rates, update rates;
- dependencies in data processing, e.g., graph-based relationships in stream processing topologies;
- properties of the underpinning data processing technologies, e.g., configuration, topologies, physical resource requirements.
This is a major limitation for existing methodologies. For example, using a state-of-the-art MDE approach for cloud computing, but without the above annotations, developers would not be able to describe:
- individual dependencies between components and data streams, therefore it would be impossible for the QA tool chain receiving to understand how a refactoring is going to affect latencies, costs and reliability for the data-intensive part of the application;
- relationships between compute and memory requirements of individual software components and the volumes and I/O rates of the data, which would make it difficult to predict quality at design time. Understanding these relationships is important to quantify the costs of an application.
- The lack of an explicit annotation for data characteristics would make it difficult to integrate in the QA tool chain a feedback analysis and performance anomaly detection capability, since the QA tool chain would not be in a condition to synchronize the models with monitoring data collected from the runtime.
Tackling these barriers will realize the breakthrough of accelerating the development of business-critical data-intensive applications, by fostering shorter development, deployment and testing cycles. This will increase the competitiveness of the European software engineering industry, especially of small and medium ISVs that cannot afford dedicated quality teams.
End-users of DICE are primarily software engineers and architects in small and medium ISVs with a basic knowledge of UML, but without advanced expertise in quality engineering. The goal is to put these end-users in condition to perform advanced quality engineering of data-intensive applications.
Moreover, DICE is designed to help both developers and operators. The tools that DICE provides will help in lowering existing barriers between Development and Operations Teams by embracing DevOps practices to change and improve the way data intensive software is created and tested. This means that DICE ecosystem can help software vendors of any size to build and run Big Data Applications taking into account business and technical needs and quality requirements. The DICE solution can be un

Related information

Record Number: 186642 / Last updated on: 2016-07-14