Skip to main content

Polyglot and Hybrid Persistence Architectures for Big Data Analytics

Periodic Reporting for period 2 - TYPHON (Polyglot and Hybrid Persistence Architectures for Big Data Analytics)

Reporting period: 2019-09-01 to 2020-12-31

Organisations are faced with the challenge of managing ever-growing volumes of data, which can vary significantly in terms of consistency and availability requirements. For example, e-commerce systems using data to provide recommendations for products need to be highly available as data is constantly retrieved and updated as users browse the system. Consistency of such data is not critical so a small loss of part of the data can be reasonably traded for a significant improvement in availability. On the other hand, for other subsets of data in the same system, such as recording customer orders and payments, compromising data consistency to improve availability is not acceptable.

Relational databases were once considered the de facto technology for persisting and managing large volumes of data. This has changed recently with the emergence of Google, Twitter, Facebook, Amazon and others, which were faced with extremely large data sets and unprecedentedly high availability requirements. The challenges involved in scaling such databases has led to the emergence of a new generation of purpose-specific databases grouped under the term NoSQL, which are designed with horizontal scalability as a primary concern, and that deliver increased availability and fault tolerance at a cost of having temporary inconsistency and reduced durability of data.

To balance requirements for data consistency and availability, organisations increasingly are migrating towards hybrid data persistence architectures comprising both relational and NoSQL databases for managing different subsets of their data using ad-hoc architectures. At the same time, as the volume and the value of textual content constantly grows, built-in support for sophisticated text processing in data persistence architectures is becoming increasingly essential.

This introduces a number of challenges including ensuring the coherency of the overall design, the assembly and configuration of the different components of the architecture, and the consistency of the overlapping data. Also, in order to access the data, developers need to write application code against different types of persistence backends. Unlike relational databases, NoSQL databases do not conform to a common set of standards (e.g. SQL, ODBC, JDBC) and application code is specific to the NoSQL database used, making it difficult to migrate. Undisciplined development of such data persistence architectures also introduces data evolution and migration challenges and complicates the development and maintenance of real-time analytics and monitoring capabilities.
The project has completed development of each of the components for designing, developing, deploying, querying, evolving, analysing and monitoring big data applications that utilise hybrid data stores. The integration tasks have also been completed and the integrated platform has been utilised for industrial deployments and evaluations by the four Use Case partners representing key European industries that rely on Big Data.

The industrial evaluations have demonstrated substantial technical improvements from use of the TYPHON technologies for hybrid data stores, and also important business benefits. Project partners have undertaken dissemination actions even with the hindrances of the pandemic situation, and have completed the dissemination and exploitation planning for the project results with each partner having specific plans extending beyond completion of the project in support of reducing the complexity, time and effort required for European developers to develop innovative Big Data applications that exploit hybrid data stores.

Each of the project partners have put in place plans for the long term sustainability of the open source project technologies to ensure that European big data application developers and platform providers benefit from the project innovations long after the project is completed.
TYPHON has provided an industry validated methodology and integrated technologies for designing, developing, deploying, querying, evolving, analysing and monitoring architectures for scalable persistence of hybrid data. The key scientific and technology innovations that have been developed include:

+ Technologies and methodology for designing hybrid polystores taking into account the structure of the data, the availability, partitioning and consistency requirements of different subsets of the data and the available deployment resources.
+ Novel algorithms for transforming hybrid polystore design models into preconfigured optimised virtual machines which can be deployed on cloud infrastructure.
+ An extensible high-level language for querying and modifying data persisted in hybrid polystores, and facilities for translating high-level queries into efficient native queries.
+ A high-performance framework for publishing and processing data access and update events to facilitate real-time monitoring and predictive analytics.
+ Technologies and methodology for evolving the organisation and distribution of data in hybrid polystores, along with tools for monitoring use of polystores for more optimised evolution.

The TYPHON technologies have been validated through four industrial Big Data applications that involve datasets with different volume, variety and velocity characteristics from key European sectors of Automotive, Aerospace, Banking, and Telecommunications.

The key industrial impacts targeted by the TYPHON project that have been achieved are the following:
+ Powerful Big Data processing tools and methods for hybrid data and demonstrations of their applicability in real-world settings.
+ Significant increase in the speed of data throughput and access for hybrid data architectures that have been measured against industry validated benchmarks.
+ Definition of new standards fostering data sharing, exchange and interoperability for hybrid data architectures.

TYPHON innovations have been driven by Big Data requirements from four industrial user partners with Big Data applications for Smart Connected Vehicles, Earth Observation Data Management, Hybrid Bank Data Warehousing and Telecom Predictive Maintenance, who have each validated the project impacts are fulfilled using hybrid data architectures and industrial persistence technologies.