Skip to main content

Integrated Data Analysis Pipelines for Large-Scale Data Management, HPC, and Machine Learning

Deliverables

DSL runtime design

Report on the initial design of distribution primitives and existing framework integration

Initial System Architecture

Report on requirements of endtoend data analysis pipelines and design of the initial system architecture

Scheduler design for pipelines and tasks

Report on the initial overall design of the scheduling components scheduling of pipelines and workflows as well as task and data placement

SotA survey of benchmarks from DM, HPC, and ML Sys

Report on the stateoftheart of benchmarks for database systems data management highperformance computing and ML systems

Initial pipeline definition all use cases

Report on use case studies with technical details and the definition of initial pipelines that can be used for testing

Language Design Specification

Report on the language abstractions APIs and DSL as well as the central internal representation

Design of integration HW accelerators

Report on the planned overall design of integration HW accelerators as well as details on accelerated operations and primitives as well as its compiler and runtime support

Report on search space analysis, automatic capability configuration

Report on stateoftheart techniques for computational storage neardata processing and potential side effects as well as an overview of automatically determining the capabilities of a storage configuration

1st Annual Project Report

Public report describing the project progress until M12 achievements and impact as well as a calculation of efforts and costs

Compiler Prototype

Software artifact of the initial compiler prototype

Searching for OpenAIRE data...

Publications

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines

Author(s): Patrick Damme, Marius Birkenbach, Constantinos Bitsakos, Matthias Boehm, Philippe Bonnet, Florina Ciorba, Mark Dokter, Pawel Dowgiallo, Ahmed Eleliemy, Christian Faerber, Georgios Goumas, Dirk Habich, Niclas Hedam, Marlies Hofer, Wenjun Huang, Kevin Innerebner, Vasileios Karakostas, Roman Kern, Tomaž Kosar, Daniel Krems, Andreas Laber, Wolfgang Lehner, Eric Mier, Marcus Paradies, Bernhard Peischl
Published in: Conference on Innovative Data Systems Research, CIDR, 9.1.2022-12.1.2022, 2022
Publisher: Conference on Innovative Data Systems Research, CIDR

Not your Grandpa's SSD: The Era of Co-Designed Storage Devices

Author(s): Alberto Lerner, Philippe Bonnet
Published in: Proceedings of the 2021 International Conference on Management of Data, 2021
Publisher: ACM

A Survey of Big Data, High Performance Computing, and Machine Learning Benchmarks

Author(s): Nina Ihde, Paula Marten, Ahmed Eleliemy, Gabrielle Poerwawinata, Pedro Silva, Ilin Tolovski, Florina M. Ciorba, Tilmann Rabl
Published in: Proceedings of the Thirteenth TPC Technology Conference on Performance Evaluation & Benchmarking, 2021
Publisher: Springer

Parallelization of benchmarking using HPC: text summarization in natural language processing (NLP), glider piloting in deep-sea missions, and search algorithms in computational intelligence (CI)

Author(s): Aleš Zamuda
Published in: Proceedings of the Austrian-Slovenian HPC Meeting 2021 - ASHPC21, 2021, ISBN 978-961-6980-77-7
Publisher: University of Ljubljana

DeGNN: Improving Graph Neural Networks with Graph Decomposition

Author(s): Miao, Xupeng; Gürel, Nezihe Merve; id_orcid0000-0002-4747-2406; Zhang, Wentao; Han, Zhichao; Li, Bo; Min, Wei; Rao, Susie; id_orcid0000-0003-2379-1506; Ren, Hansheng; Shan, Yinan; Shao, Yingxia; Wang, Yujie; Wu, Fan; Xue, Hui; Yang, Yaming; Zhang, Zitao; Zhao, Yang; Zhang, Shuai; id_orcid0000-0002-7866-4611; Wang, Yujing; Cui, Bin; Zhang, Ce
Published in: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD '21), annually, 2021
Publisher: ACM

Evaluating In-Memory Hash Joins on Persistent Memory

Author(s): Tobias Maltenberger, Till Lehmann, Lawrence Benson, Tilmann Rabl
Published in: 25th International Conference on Extending Database Technology (EDBT), annually, 2022
Publisher: OpenProceedings.org
DOI: 10.48786/edbt.2022.23

Darwin: Scale-In Stream Processing

Author(s): Lawrence Benson, Tilmann Rabl
Published in: Conference on Innovative Data Systems Research, CIDR 22, annually, 2022
Publisher: Conference on Innovative Data Systems Research, CIDR 22

Considering a Fear and Greed Index in Bitcoin Price Prediction Through Long Short-Term Memory

Author(s): Nataša Ošep Ferš, Aleš Zamuda
Published in: IEEE Slovenia Section, annually, 2021
Publisher: IEEE

Maximizing Persistent Memory Bandwidth Utilization for OLAP Workloads

Author(s): Björn Daase, Lars Jonas Bollmeier, Lawrence Benson, Tilmann Rabl
Published in: Proceedings of the 2021 International Conference on Management of Data (SIGMOD 2021), 2021
Publisher: ACM

VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition

Author(s): Li, Yang; Shen, Yu; Zhang, Wentao; Jiang, Jiawei; Ding, Bolin; Li, Yaliang; Zhou, Jingren; Yang, Zhi; Wu, Wentao; Zhang, Ce; Cui, Bin
Published in: Proceedings of the VLDB Endowment, 14 (11), annually, 2021
Publisher: PVLDB

Ease. ML: A Lifecycle Management System for Machine Learning

Author(s): Aguilar Melgar, Leonel; id_orcid0000-0001-6864-4492; Dao, David; Gan, Shaoduo; Gürel, Nezihe M.; Hollenstein, Nora; id_orcid0000-0001-7936-4170; Jiang, Jiawei; Karlaš, Bojan; Lemmin, Thomas; id_orcid0000-0001-5705-4964; Li, Tian; Li, Yang; Rao, Susie; id_orcid0000-0003-2379-1506; Rausch, Johannes; Renggli, Cedric; Rimanic, Luka; Weber, Maurice; Zhang, Shuai; id_orcid0000-0002-7866-4611; Zhao, Zh
Published in: Proceedings of the Annual Conference on Innovative Data Systems Research (CIDR), 2021, 1, 2021
Publisher: CIDR 2021
DOI: 10.3929/ethz-b-000458916

Drop It In Like It’s Hot: An Analysis of Persistent Memory as a Drop-in Replacement for NVMe SSDs

Author(s): Maximilian Böther, Otto Kißig, Lawrence Benson, Tilmann Rabl
Published in: International Workshop on Data Management on New Hardware (DAMON’21), 2021
Publisher: ACM SIGMOD/PODS

Efficiently Managing Deep Learning Models in a Distributed Environment

Author(s): Nils Strassenburg, Ilin Tolovski, Tilmann Rabl
Published in: 25th International Conference on Extending Database Technology (EDBT), annually, 2022
Publisher: OpenProceedings.org
DOI: 10.48786/edbt.2022.12

A Resourceful Coordination Approach for Multilevel Scheduling

Author(s): Eleliemy, Ahmed; Ciorba, Florina M.
Published in: International Conference on High Performance Computing & Simulation (HPCS) 2021, annual, 2021
Publisher: HPCS

Viper: An Efficient Hybrid PMem-DRAM Key-Value Store

Author(s): Lawrence Benson, Hendrik Makait, Tilmann Rabl
Published in: 2021
Publisher: ACM

DocParser: Hierarchical Document Structure Parsing from Renderings

Author(s): Rausch, Johannes; Martinez, Octavio; Bissig, Fabian; Zhang, Ce; Feuerriegel, Stefan
Published in: Proceedings of the AAAI Conference on Artificial Intelligence, 35 (5), 2021, Page(s) 4328-4338, ISSN 2159-5399
Publisher: AAAI Press
DOI: 10.13039/501100000780

Don’t Compete, Let’s Cooperate: A Cooperative Scheduling Approach

Author(s): Ahmed Eleliemy, Florina M. Ciorba
Published in: Platform for Advancing Scientific Computing Conference, 2021
Publisher: PASC

CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks

Author(s): Li Peng, Rao Xi, Jennifer Blase, Xu Chu, Yue Zhang, Ce Zhang
Published in: DeGNN, 2020
Publisher: ETH Zurich, Institute for Computing Platforms
DOI: 10.13039/501100001711

Single- and Two-Level Dynamic Load Balancing of Scientific Applications

Author(s): Ahmed Eleliemy, Florina M. Ciorba
Published in: Platform for Advancing Scientific Computing Conference, 2021
Publisher: PASC

The urban morphology on our planet – Global perspectives from space

Author(s): Xiao Xiang Zhu,Chunping, Qiu, Jingliang Hua, Yilei Shi, Yuanyuan Wang, Michael Schmitta, Hannes Taubenböck
Published in: Remote Sensing of Environment, 16 volumes / year, 2021, ISSN 0034-4257
Publisher: Elsevier BV
DOI: 10.1016/j.rse.2021.112794

Micro-architectural analysis of in-memory OLTP: Revisited

Author(s): Utku Sirin, Pınar Tözün, Danica Porobic, Ahmad Yasin, Anastasia Ailamaki
Published in: The VLDB Journal, Volume 30, every other month, July 2021, 2021, ISSN 1066-8888
Publisher: Springer Verlag
DOI: 10.1007/s00778-021-00663-8

Better Database Cost/Performance via Batched I/O on Programmable SSD

Author(s): Jaeyoung Do, Ivan Luiz Picoli, David Lomet, Philippe Bonnet
Published in: Conference on Very Large Data Bases (VLDB Journal), 18.2.2021, 2021, ISSN 1066-8888
Publisher: Springer Verlag
DOI: 10.1007/s00778-020-00648-z

LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications

Author(s): Jonas H. Müller Korndörfer; Ahmed Eleliemy; Ali Mohammed; Florina M. Ciorba
Published in: IEEE Transactions on Parallel and Distributed Systems, Volume 33, Issue 4, 2021, Page(s) 830 - 841, ISSN 1045-9219
Publisher: Institute of Electrical and Electronics Engineers
DOI: 10.1109/tpds.2021.3107775