Skip to main content

Developing Data-Intensive Cloud Applications with Iterative Quality Enhancements

Periodic Reporting for period 3 - DICE (Developing Data-Intensive Cloud Applications with Iterative Quality Enhancements)

Reporting period: 2017-02-01 to 2018-01-31

Recent years have seen the rapid growth of interest for cloud applications built on top of Big Data technologies such as MapReduce/Hadoop, stream processing systems, and NoSQL databases. However, there is still a shortage of software engineering models, methods, and tools for developing data-intensive software systems that can harness these technologies, taking into account quality characteristics such as efficiency, reliability, and safety.

This shortcoming is significant, since the rush to hit the Big Data market with new products leads companies to reduce the attention they pay to quality aspects, which are yet critical to avoid project failures. This issue is exacerbated by the fact that quality engineering of data-intensive software systems is still in its infancy, making it difficult today to analyze, predict and guarantee quality-of-service for this class of applications.

DICE is the first open source framework that offers a quality-aware methodology to develop and operate Big Data applications. With DICE, software vendors and developers can efficiently prototype new data-intensive applications at low cost, quickly creating business cases and proofs-of-concept for Big data technologies within their organisations. DICE encompasses quality assessment, architecture enhancement, testing and agile delivery, relying on principles of the emerging DevOps paradigm. At a high-level, the DICE paradigm aims at:
- Tackling skill shortages and learning curves in quality-driven development and Big Data technologies through open source development tools, models, and methods.
- Shortening the time to market for data-intensive applications that meet quality requirements, reducing costs for independent software vendors (ISVs) and increasing value for end-users.
- Reducing the number and the severity of quality incidents by iteratively learning the quality-levels of the application at runtime, feeding this information back to the design environment.
A free WikiBook is available that describes the DICE methodology in more detail (http://www.dice-h2020.eu/book/).

Open source and commercial versions of the DICE framework can be obtained from the project website (http://www.dice-h2020.eu/). Experimental assessments of the DICE methodology using these tools have been carried out against three industrial pilots involving:
- Stream-processing systems for social media data analysis
- Batch processing for tax fraud detection
- Cloud-based management of real-time port operations
Results indicate substantial productivity gains thanks to DICE, particularly in terms of reduction of deployment and configuration time for Big data platforms, compared to manual. The DICE framework is also able to identify several violations and anti-patterns in the application designs, as well as consistently reduce manual times for testing and system evaluation.
The DICE framework offers a DevOps framework covering multiple aspects of the lifecycle of a Big data application. A collection of 14 tools has been created and released as open source. The tools can guide in the definition of new Big data applications or in extending existing ones. A knowledge repository has been created to help end user explore the different features of the tools, as well as navigate through supporting tutorials and videos: https://github.com/dice-project/DICE-Knowledge-Repository/wiki/DICE-Knowledge-Repository

In particular, the open source release of the DICE framework is available free of charge and offers to development and operations teams:
- An Eclipse-based IDE implementing the DICE DevOps methodology and guiding the user step-by-step through the use of cheatsheets
- A new UML profile to design data-intensive applications taking into account quality-of-service requirements and featuring privacy-by-design methods
- Quality analysis tools to simulate, verify, and optimize the application design and identify possible anti-patterns
- OASIS TOSCA-compliant deployment and orchestration on cloud VMs and containers
- Monitoring and anomaly detection tools based on the Elasticsearch-Logstash-Kibana stack
- Runtime methods for configuration optimization, testing and fault injection
- Native support for open-source Apache platforms such as Storm, Spark, Hadoop, and Cassandra.

The DICE framework is also available in commercial versions focused on real-time applications and batch processing system development.

The DICE tools have been presented and are actively downloaded by a diverse group of stakeholders. Videos that illustrate cross-cutting benefits of the solution for different needs and use case scenarios are available on the DICE YouTube channel (https://www.youtube.com/channel/UC1EcaiuK-7Ztbj5n8n4MeFQ) together with tutorials on the DICE blog (http://www.dice-h2020.eu/blog/) as well as regular announcements on the DICE Twitter feed (https://twitter.com/diceh2020).
DICE delivers innovative development methods and tools to strengthen the competitiveness of small and medium European ISVs in the market of business-critical data-intensive applications. The barriers that DICE intends to break are the shortage of methods to express data-aware quality requirements in model-driven development and the ability to consistently consider these requirements throughout the tool-chain of quality analysis, testing, and deployment. Existing methodologies and tools provide these capabilities for traditional enterprise software systems and cloud-based applications, but when it comes to increasingly popular technologies such as, e.g. Hadoop/MapReduce, Spark, Storm, Cassandra, it is not possible to adopt a quality-driven software engineering approach yet. DICE intends to deliver this capability, providing a quality-driven development environment for data-intensive applications.
In the design of a data-intensive applications, existing software engineering approaches face a number of limitations, even if one considers the basic specification of requirements. For example, it is possible with MDE to express entity-relationship models, basic dependencies between components and data, field types and values, and data semantics. In the operations of a data-intensive application, DICE offers methods for deployment, monitoring, testing, configuration and anomaly detection for the aforementioned data-intensive technologies.
DICE Framework