Skip to main content

The Computational Database for Real World Awareness

Periodic Reporting for period 4 - CompDB (The Computational Database for Real World Awareness)

Reporting period: 2021-12-01 to 2022-05-31

Two major hardware trends have a significant impact on the architecture of database management systems (DBMSs): First, main memory sizes continue to grow significantly. Machines with 1 TB of main memory and more are readily available at a relatively low price. Second, the number of cores in a system continues to grow, from currently 60 and more to hundreds in the near future. This trend offers radically new opportunities for both business and science. It promises to allow for information-at-your-fingertips, i.e. large volumes of data can be analyzed and deeply explored online, in parallel to regular transaction processing. Currently, deep data exploration is performed outside of the database system which necessitates huge data transfers. This impedes the processing such that real-time interactive exploration is impossible. These new hardware capabilities now allow to build a true computational database system that integrates deep exploration functionality at the source of the data. This will lead to a drastic shift in how users interact with data, as for the first time interactive data exploration becomes possible at a massive scale.
Within the project we developed a scalable query that can execute complex application logic as part of regular query processing. By integrating high-level, user provided algorithms we can offer much richer query functionality and enable interactive exploration of data.

We made significant progress on a new system architecture, addressing many challenges in efficient compilation and language integration. One of the goals is this project is seamless integration of high-level data processing, specified in a programming language, with traditional database query support. This has many technical challenges, including, somewhat surprisingly, compile time: When specifying a complex algorithm and then later executing it on a very efficient parallel execution engine, the compile time can be higher than the actual execution time. This turned out to be problematic for interactive use cases, but we developed a new compilation framework that adaptively compiles the different parts of the execution plan depending upon usage: The code is compiled initially using a very cheap compiler that is optimized for compile time and uses a new linear time register allocator, and more expensive compilation modes are then used to improve the initial code only when the observed execution times and the cost model predict expensive compilation to be beneficial. This allows for every efficient execution of “cheap” queries (i.e. queries that might be structurally complex, but that touch comparatively little data), while complex analytical still benefit from the full power of an optimizing compiler backend. Extensive work on algebraic optimization leads to an improved query optimization component, which is essential for handle large and complex analytical queries, whereas previous approaches were unable to find solutions for large queries. The optimization framework we developed can handle all classes of queries, including queries with cross products and hyper-edges, which is important to handle arbitrary analytical queries.
And integrated user defined operators into the query execution workflow, which can be used as building block for executing high-level execution logic.
The compilation and optimization work significantly advanced the start of the art, and accordingly was published in top venues (SIGMOD and ICDE), including a best paper award for the compilation work. Together with other compilation techniques we are now able to assemble very complex analytical queries, including window functions and complex aggregations, from low-level primitives leading to very fast execution plans.
Our work on complex analytical processing using user defined logic offers a much richer and powerful interface for expressing application logic, and has been accepted at PVLDB.
The overall system has been used for experiments by several other groups, has demonstrated excellent performance in many different application scenarios.
Adaptive compilation strategies across data sizes