Periodic Reporting for period 2 - DAPHNE (Integrated Data Analysis Pipelines for Large-Scale Data Management, HPC, and Machine Learning)
Período documentado: 2022-06-01 hasta 2023-11-30
Project Management / Dissemination (WP 1 and 10): Besides regular all-hands meetings, in the first 36 months comprising project tracking according to project plan and communicating project news on a monthly basis, DAPHNE cloud, hosted by Know-Center has been maintained to ensure a smooth sharing of project files, materials, and documents. The specific tasks in project management carried out since project runtime are documented in the project and risk management plan D1.1 the research data management plan D1.2 the first annual report D1.3 and the second annual report D1.4 and up to project month 36/November 2023 the third annual report D1.5.
System Architecture and DSL (WP 2 and 3): Since February 2021, we actively develop a prototype of the DAPHNE system, which was shared as a demonstrator in D3.2 and as of March 31, 2022 has been migrated to a public OSS repository with Apache 2 license. Afer elaborating on the refined system architecture in D2.1 and D2.2 as well as compiler designs in D3.1 and the intial prototype in D3.2 we have developed the extended Daphne compiler prototype, which is MLIR based and acts as a library of compiler infrastructure to faciilitate a cost-effective development of our domain-specific language, reuse of compiler infrastructure, and extensibility.
Runtime and Scheduling (WP 4 and 5): Discussions in WP 4 and 5 combined knowledge sharing of selected techniques, and in-depth discussions of runtime aspects of the prototype and its extensions. Initial efforts centered around the core data structures and kernels. We introduced a vectorized (tiled) execution engine that processes operator pipelines in a task-based manner on tiles of inputs. The design is described in the system architecture in D2.1 language abstractions in D3.1 the DSL runtime design in D4.1 and the scheduler design in D5.1. Beyond the local runtime, we also created an initial distributed runtime system, which uses hierarchical vectorized pipelines. Additional work investigated distribution primitives, collective operations (e.g. MPI), parameter servers, and distribution strategies. For hierarchical scheduling, we already analyzed requirements, and explored various task scheduling strategies.
Computational Storage and HW Accelerators (WP 6 and 7): Work packages 6 and 7 also have natural synergies. Besides knowledge sharing, initial work of the first 18 months covered basic I/O support for selected data formats, an analysis of the design space and current technology trends in D6.1 as well as an initial integration of GPU and FPGA operations, related data placement primitives, and tailor-made device kernels for selected operations (e.g. FPGA quantization). The integration of GPU and FPGA accelerators is important for performance of various end-to-end pipelines, and serve as examples for integrating other HW accelerators. GPUs (and later FPGAs) are also part of vectorized execution to exploit heterogeneous HW jointly.
Use Cases and Benchmarks (WP 8 and 9): The work packages 8 and 9 conducted regular meetings for discussions of the individual use cases, the use case descriptions, and ML pipeline implementations. A major outcome are the use case pipelines in D8.1 which serve as example use cases for the DAPHNE system and real-world benchmarks. We further surveyed existing benchmarks in databases, data-parallel computation, HPC, and ML systems in D9.1. Additionally, HPI made major contributions to the development of the TPCx-AI benchmark (released in 09/2021) and several partners (HPI, UNIBAS, KNOW) conducted student projects for benchmarking IDA pipelines and additional TPCx-AI implementations. The focus of the third project year has been to bring the bottom-up developed DAPHNE system closer to the top-down developed use cases.