Skip to main content

STREAMLINE

Deliverables

Annual Report, Quality Assurance and Evaluation Period 1

This deliverable will present a summary of the activities carried out in Y1, telling a coherent story of the work produced, referring to detailed accounts in the respective deliverables. It will additionally detail the Quality Management and Control policies of the project and explain how they were enforced in the work leading to and in the production of each of the deliverables of Y1. Finally, it will measure success through the evaluation of the measurable outcomes set out in each of the tasks described in this Description of Action, using appropriate key performance indicators.

Flink Real Time Stream Mining Library v1

Version 1 of the Flink Real Time Stream Mining Library with evaluation measurements over use case partner data. Basic classification, regression and recommendation methods for combined batch and stream machine learning based on linear models and stochastic gradient descent, also involving low memory synopses for sublinear storage of long-term updatable model components. Baseline measures defined for WP2.

Design and Implementation v1

First iteration of the design defined and implementation carried out in T5.1, T5.2, T5.3.

Annual Report, Quality Assurance and Evaluation Period 2

This deliverable will present a summary of the activities carried out in Y2, telling a coherent story of the work produced, referring to detailed accounts in the respective deliverables. It will additionally detail the Quality Management and Control policies of the project and explain how they were enforced in the work leading to and in the production of each of the deliverables of Y2. Finally, it will measure success of Y2 activities through the evaluation of the measurable outcomes set out in each of the tasks described in this Description of Action, using appropriate key performance indicators.

Combined Data at Rest and Data in Motion Analysis Platform v2

As with all versions of the platform, it will be evaluated using the use case partner data. Delivery plans for M22 (Y2): an advanced demonstration of our platform, i.e. V2, with a larger set of optimization features, operators for unified batch-stream processing, and limited fault tolerance and incremental computation support.

Flink Real Time Stream Mining Library v3

Version v3 of the Flink Real Time Stream Mining Library with evaluation measurements over use case partner data. Final version of the online machine learning package tested and evaluated over WP4-5 business cases against Y1 baselines.

Design and Implementation v3

Third iteration of the design defined and implementation carried out in T5.1, T5.2, T5.3.

Project Plan Period 1

A detailed plan of the activities to be carried during the first year.

Combined Data at Rest and Data in Motion Analysis Platform v3

As with all versions of the platform, it will be evaluated using the use case partner data. Delivery plans for M34 (Y3): a full-version platform V3 with all tasks implemented, tested and evaluated over WP4-5 business cases against Y1 baselines and competitor products.

Status report on dissemination activities Period 1

Detailed description of dissemination results achieved during Y1 of the project.

Dissemination Roadmap & Project Website

Define the expected project outputs, dissemination and communication activities to be developed during the entire duration of the project. Launch the project website with basic information on the project -- project goals, consortium composition, use cases descriptions -- which will then be updated continuously throughout the duration of the project.

Use case report for actionable knowledge extraction from text information

A report on the extracting actionable knowledge from advanced text data mining using various machine learning algorithms, such as passive-agressive.

Design and Implementation v2

Second iteration of the design defined and implementation carried out in T5.1, T5.2, T5.3.

Combined Data at Rest and Data in Motion Analysis Platform v1

As with all versions of the platform will be evaluated using the use case partner data. Delivery plans for M10 (Y1): a specification document and a basic demo platform, i.e. V1, with a) subset of query optimization features (like operator chaining) and b) primitive operators necessary for analyzing data at rest and data in motion together.

Status report on dissemination activities Period 2

Detailed description of dissemination results achieved during Period 2 of the project.

Project Plan Period 2

A detailed plan of the activities to be carried during the second year.

Flink Real Time Stream Mining Library v2

Version 2 of the Flink Real Time Stream Mining Library with evaluation measurements over use case partner data. Advanced methods, depending on use cases, potentially including gradient boosted trees, kernel methods, implicit and explicit ALS and tensor factorization, differential privacy and peer-to-peer recommenders.

A high level declarative language for ML

A programming model to express different use-cases in our high-level language, and an easy to use declarative language for using ML algorithms on massive dataset.

Field trials and Evaluation v1

First iteration of the field trials and evaluation carried out in T5.4.

Flink deployment software

A deployment tool for automatic installation of Flink on a cluster. It consists of the Chef cookbooks on Karamel for the Flink stack.

Flink interactive environment

An interactive data analytics tool for Apache Flink that consists of (i) a REPL or language shell with an interactive environment that takes a user inputs, evaluates them, and returns the result to the user quickly, and (ii) a web-based environment, based on Zeppelin, that enables interactive data analyses.

Flink on Hops/Hadoop

Extension and revision of D3.1 addressing integration of Apache Flink into Hops/Hadoop ecosystem

Field Trials and Evaluation v2

Second iteration of the field trials and evaluation carried out in T5.4.

Field Trials and Implementation v3

Third iteration of the field trials and evaluation carried out in T5.4.

Searching for OpenAIRE data...

Publications

Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing

Author(s): Philipp M. Grulich, René Saitenmacher, Jonas Traub, Sebastian Breß, Tilmann Rabl, Volker Markl
Published in: 21st International Conference on Extending Database Technology (EDBT), 2018, 2018
DOI: 10.5441/002/edbt.2018.51

Optimized on-demand data streaming from sensor nodes

Author(s): Jonas Traub, Sebastian Breß, Tilmann Rabl, Asterios Katsifodimos, Volker Markl
Published in: Proceedings of the 2017 Symposium on Cloud Computing - SoCC '17, 2017, Page(s) 586-597
DOI: 10.1145/3127479.3131621

STREAMLINE - Streamlined Analysis of Data at Rest and Data in Motion

Author(s): Philipp M. Grulich, Tilmann Rabl, Volker Markl, Csaba Sidló, Andras Benczur
Published in: 20th International Conference on Extending Database Technology (EDBT), 2017, 2017

I2: Interactive Real-Time Visualization for Streaming Data

Author(s): Jonas Traub, Nikolaas Steenbergen, Philipp M. Grulich, Tilmann Rabl, Volker Markl
Published in: 20th International Conference on Extending Database Technology (EDBT), 2017, 2017
DOI: 10.5441/002/edbt.2017.61

Bridging the gap - towards optimization across linear and relational algebra

Author(s): Andreas Kunft, Alexander Alexandrov, Asterios Katsifodimos, Volker Markl
Published in: Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond - BeyondMR '16, 2016, Page(s) 1-4
DOI: 10.1145/2926534.2926540

Emma in Action - Declarative Dataflows for Scalable Data Analysis

Author(s): Alexander Alexandrov, Andreas Salzmann, Georgi Krastev, Asterios Katsifodimos, Volker Markl
Published in: Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16, 2016, Page(s) 2073-2076
DOI: 10.1145/2882903.2899396

Benchmarking Distributed Stream Data Processing Systems

Author(s): Jeyhun Karimov, Tilmann Rabl, Asterios Katsifodimos, Roman Samarev, Henri Heiskanen, Volker Markl
Published in: 2018 IEEE 34th International Conference on Data Engineering (ICDE), 2018, Page(s) 1507-1518
DOI: 10.1109/ICDE.2018.00169

Efficient Window Aggregation with General Stream Slicing

Author(s): Jonas Traub Philipp Grulich, Alejandro Rodríguez Cuéllar Sebastian Breß Asterios Katsifodimos Tilmann Rabl Volker Markl
Published in: 22nd International Conference on Extending Database Technology (EDBT), 2019, 2019

Continuous Deployment of Machine Learning Pipelines

Author(s): Behrouz Derakhshan, Alireza Rezaei Mahdiraji, Tilmann Rabl, and Volker Markl
Published in: 22nd International Conference on Extending Database Technology (EDBT), 2019, 2019

Tutorial on Open Source Online Learning Recommenders

Author(s): Róbert Pálovics, Domokos Kelen, András A. Benczúr
Published in: Proceedings of the Eleventh ACM Conference on Recommender Systems - RecSys '17, 2017, Page(s) 400-401
DOI: 10.1145/3109859.3109937

Alpenglow: Open Source Recommender Framework with Time-aware Learning and Evaluation

Author(s): Erzsébet Frigó, Róbert Pálovics, Domokos Kelen, Levente Kocsis, András A. Benczúr
Published in: RecSys 2017 poster, 2017

Online ranking prediction in non-stationary environments

Author(s): Erzsébet Frigó, Róbert Pálovics, Domokos Kelen, Levente Kocsis, András A. Benczúr
Published in: RecTemp 2017 – workshop on reasoning on temporal aspects in user modeling in conjunction with RecSys 2017, 2017

Tracing Distributed Data Stream Processing Systems

Author(s): Zoltan Zvara, Peter G.N. Szabo, Gabor Hermann, Andras Benczur
Published in: 2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W), 2017, Page(s) 235-242
DOI: 10.1109/fas-w.2017.153

Efficient K-NN for Playlist Continuation

Author(s): Domokos M. Kelen, Dániel Berecz, Ferenc Béres, András A. Benczúr
Published in: Proceedings of the ACM Recommender Systems Challenge 2018 on - RecSys Challenge '18, 2018, Page(s) 1-4
DOI: 10.1145/3267471.3267477

Cutty - Aggregate Sharing for User-Defined Windows

Author(s): Paris Carbone, Jonas Traub, Asterios Katsifodimos, Seif Haridi, Volker Markl
Published in: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management - CIKM '16, 2016, Page(s) 1201-1210
DOI: 10.1145/2983323.2983807

Benchmarking Data Flow Systems for Scalable Machine Learning

Author(s): Christoph Boden, Andrea Spina, Tilmann Rabl, Volker Markl
Published in: Proceedings of the 4th Algorithms and Systems on MapReduce and Beyond - BeyondMR'17, 2017, Page(s) 1-10
DOI: 10.1145/3070607.3070612

Query Centric Partitioning and Allocation for Partially Replicated Database Systems

Author(s): Tilmann Rabl, Hans-Arno Jacobsen
Published in: Proceedings of the 2017 ACM International Conference on Management of Data - SIGMOD '17, 2017, Page(s) 315-330
DOI: 10.1145/3035918.3064052

From BigBench to TPCx-BB: Standardization of a Big Data Benchmark

Author(s): Paul Cao, Bhaskar Gowda, Seetha Lakshmi, Chinmayi Narasimhadevara, Patrick Nguyen, John Poelman, Meikel Poess, Tilmann Rabl
Published in: Performance Evaluation and Benchmarking. Traditional - Big Data - Interest of Things, Issue 10080, 2017, Page(s) 24-44
DOI: 10.1007/978-3-319-54334-5_3

A survey of state management in big data processing systems

Author(s): Quoc-Cuong To, Juan Soto, Volker Markl
Published in: The VLDB Journal, Issue 27/6, 2018, Page(s) 847-872, ISSN 1066-8888
DOI: 10.1007/s00778-018-0514-9

Blockjoin: efficient matrix partitioning through joins

Author(s): Andreas Kunft, Asterios Katsifodimos, Sebastian Schelter, Tilmann Rabl, Volker Markl
Published in: Proceedings of the VLDB Endowment - Proceedings of the 43rd International Conference on Very Large Data Bases, Issue 10/13, 2017, Page(s) 2061-2072, ISSN 2150-8097

Temporal walk based centrality metric for graph streams

Author(s): Ferenc Béres, Róbert Pálovics, Anna Oláh, András A. Benczúr
Published in: Applied Network Science, Issue 3/1, 2018, ISSN 2364-8228
DOI: 10.1007/s41109-018-0080-5

Towards Streamlined Big Data Analytics

Author(s): András A. Benczúr, Róbert Pálovics, Márton Balassi, Volker Markl, Tilmann Rabl, Juan Soto, Björn Hovstadius, Jim Dowling,Seif Haridi
Published in: ERCIM News, Issue 107, 2016, Page(s) 31-32

Online Machine Learning in Big Data Streams

Author(s): András A. Benczúr, Levente Kocsis, Róbert Pálovics
Published in: 2018