Skip to main content

Sustainable Data Lakes for Extreme-Scale Analytics

Deliverables

Initial version of the HIN mining engine

Prototype implementation of the HIN mining library, including functionalities for similarity search and browsing, entity resolution and entity ranking.

Initial version of the visual analytics engine

First prototype of the visual analytics layer, including basic functionalities for interactive visual analytics over spatial, temporal and network data.

Final version of the visual analytics engine

Final prototype of the visual analytics engine including all functionalities for HIN exploration and analysis via scalable and interactive visualizations

Final version of the HIN mining engine

Final prototype of the Heterogeneous Information Network mining library

Query engine over virtualized data

Includes the query planning and execution operators for natively supporting queries over virtualized, heterogeneous data.

Final evaluation report

Final overall report and assessment of the SmartDataLake system including the feedback from the pilot testing and validation

Interactive visual analytics model

Model specification driving the interactive visualizations for HIN exploration, analysis and mining, including the visual interfaces and interactions for feature space exploration, model selection and parameter tuning.

System architecture

The initial architecture of the SmartDataLake platform, its individual components, and their interfaces.

Automated storage tiering

Cost model and algorithms for analytics over cold storage and for automated data placement in different storage tiers

Monitoring and assessing changes in evolving HINs

Techniques and algorithms for monitoring the temporal evolution of HINs and detecting and assessing the impact of changes

Similarity search, entity resolution and ranking

Includes the attribute-based and link-based similarity measures, techniques and algorithms for search and browsing over multi-typed entities and relations, as well as the algorithms for entity resolution and ranking.

Scalable visualization techniques for spatial, temporal and graph data

Techniques and algorithms for efficient and scalable visualizations of spatiotemporal data and information networks

Data synopses for approximate analytics

Algorithms for approximate query answering and analytics based on adaptive data synopses.

Link prediction and community detection in HINs

Algorithms for predicting new relations and discovering entity groups and communities combining attributebased and linkbased features

Publications

GGDs : Graph Generating Dependencies

Author(s): Shimomura, Larissa C.; Fletcher, George; Yakovets, Nikolay
Published in: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, Page(s) 2217-2220
Publisher: ACM
DOI: 10.1145/3340531.3412149

Link Prediction in Bibliographic Networks

Author(s): Chronis, Pantelis, Skoutas, Dimitrios, Athanasiou, Spiros, & Skiadopoulos, Spiros
Published in: 1st International Workshop on Assessing Impact and Merit in Science (AIMinScience), 2020
Publisher: Springer
DOI: 10.1007/978-3-030-55814-7_28

Accelerating Complex Analytics using Speculation

Author(s): Panagiotis Sioulas, Viktor Sanca, Ioannis Mytilinis, Anastasia Ailamaki
Published in: International Conference on Innovative Data Systems Research (CIDR), 2021
Publisher: www.cidrdb.org

Speculative Execution of Similarity Queries: Real-Time Parameter Optimization through Visual Exploration

Author(s): Thilo Spinner, Udo Schlegel, Martin Schall, Fabian Sperrle, Rita Sevastjanova, Beatrice Gobbo, Julius Rauscher, Mennatallah El-Assady, Daniel A. Keim
Published in: 1st International Workshop on Data Analytics and Machine Learning Made Simple, 2021
Publisher: CEUR Workshop Proceedings (CEUR-WS.org)

Twin Subsequence Search in Time Series

Author(s): Georgios Chatzigeorgakidis, Dimitrios Skoutas, Kostas Patroumpas, Themis Palpanas, Spiros Athanasiou, Spiros Skiadopoulos
Published in: 24th International Conference on Extending Database Technology (EDBT), 2021, Page(s) 475-480
Publisher: OpenProceedings.org

Hardware-Conscious Sliding Window Aggregation on GPUs

Author(s): Georgios Michas, Periklis Chrysogelos, Ioannis Mytilinis, Anastasia Ailamaki
Published in: International Workshop on Data Management on New Hardware (DaMoN), 2021, Page(s) 13:1-13:5
Publisher: ACM
DOI: 10.1145/3465998.3466014

A Visual Explorer for Geolocated Time Series

Author(s): Chatzigeorgakidis, Georgios, Patroumpas, Kostas, Skoutas, Dimitrios, & Athanasiou, Spiros
Published in: 28th International Conference on Advances in Geographic Information Systems (SIGSPATIAL), 2020, Page(s) 413-416
Publisher: ACM
DOI: 10.1145/3397536.3422345

Boosting Efficiency of External Pipelines by Blurring Application Boundaries

Author(s): Anna Herlihy, Periklis Chrysogelos, Anastasia Ailamaki
Published in: International Conference on Innovative Data Systems Research (CIDR), 2022
Publisher: CIDR

VeTo-web: A Recommendation Tool for the Expansion of Sets of Scholars

Author(s): Serafeim Chatzopoulos, Thanasis Vergoulis, Theodore Dalamagas, & Christos Tryfonopoulos
Published in: ACM/IEEE Joint Conference on Digital Libraries, 2021
Publisher: IEEE
DOI: 10.5281/zenodo.5548163

Easy Spark

Author(s): Ylaise van den Wildenberg, Wouter W. L. Nuijten, Odysseas Papapetrou
Published in: 1st International Workshop on Data Analytics and Machine Learning Made Simple, 2021
Publisher: CEUR-WS.org

SciNeM: A Scalable Data Science Tool for Heterogeneous Network Mining

Author(s): Serafeim Chatzopoulos, Thanasis Vergoulis, Panagiotis Deligiannis, Dimitrios Skoutas, Theodore Dalamagas, Christos Tryfonopoulos
Published in: 24th International Conference on Extending Database Technology (EDBT), 2021, Page(s) 654-657
Publisher: OpenProceedings.org

Multi-Attribute Similarity Search for Interactive Data Exploration

Author(s): Kostas Patroumpas, Alexandros Zeakis, Dimitrios Skoutas, Roberto Santoro
Published in: 1st International Workshop on Data Analytics and Machine Learning Made Simple, 2021
Publisher: CEUR Workshop Proceedings (CEUR-WS.org)

SPHINX: A System for Metapath-based Entity Exploration in Heterogeneous Information Networks

Author(s): Serafeim Chatzopoulos; Kostas Patroumpas; Alexandros Zeakis; Thanasis Vergoulis; Dimitrios Skoutas
Published in: International Conference on Very Large Data Bases (VLDB), 2020, Page(s) 2913-2916
Publisher: VLDB Endowment
DOI: 10.14778/3415478.3415507

Towards Proximity Graph Auto-configuration: An Approach Based on Meta-learning

Author(s): Rafael Seidi Oyamada, Larissa Shimomura, Sylvio Barbon Junior, Daniel Kaster
Published in: European Conference on Advances in Databases and Information Systems (ADBIS), 2020, Page(s) 93-107
Publisher: Springer

Storage Management in Smart Data Lake

Author(s): Haoqiong Bian, Bikash Chandra, Ioannis Mytilinis, Anastasia Ailamaki
Published in: 1st International Workshop on Data Analytics and Machine Learning Made Simple, 2021
Publisher: CEUR Workshop Proceedings (CEUR-WS.org)

A Parallel and Distributed Approach for Diversified Top-k Best Region Search

Author(s): Hamid Shahrivari; Matthaios Olma; Odysseas Papapetrou; Dimitrios Skoutas; Anastasia Ailamaki
Published in: Proceedings of the 23rd International Conference on Extending Database Technology (EDBT), 2020, Page(s) 265-276
Publisher: OpenProceedings
DOI: 10.5441/002/edbt.2020.24

Local Similarity Search on Geolocated Time Series Using Hybrid Indexing

Author(s): Georgios Chatzigeorgakidis, Dimitrios Skoutas, Kostas Patroumpas, Themis Palpanas, Spiros Athanasiou, Spiros Skiadopoulos
Published in: Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2019, Page(s) 179-188, ISBN 9781-450369091
Publisher: ACM
DOI: 10.1145/3347146.3359349

Scalable temporal clique enumeration

Author(s): Kaijie Zhu, George Fletcher, Nikolay Yakovets, Odysseas Papapetrou, Yuqing Wu
Published in: Proceedings of the 16th International Symposium on Spatial and Temporal Databases, 2019, Page(s) 120-129, ISBN 9781-450362801
Publisher: ACM
DOI: 10.1145/3340964.3340987

Local Pair and Bundle Discovery over Co-Evolving Time Series

Author(s): Georgios Chatzigeorgakidis, Dimitrios Skoutas, Kostas Patroumpas, Themis Palpanas, Spiros Athanasiou, Spiros Skiadopoulos
Published in: Proceedings of the 16th International Symposium on Spatial and Temporal Databases, 2019, Page(s) 160-169, ISBN 9781-450362801
Publisher: ACM
DOI: 10.1145/3340964.3340982

Automatic Clustering by Detecting Significant Density Dips in Multiple Dimensions

Author(s): Pantelis Chronis, Spiros Athanasiou, Spiros Skiadopoulos
Published in: 2019 IEEE International Conference on Data Mining (ICDM), 2019, Page(s) 91-100, ISBN 978-1-7281-4604-1
Publisher: IEEE
DOI: 10.1109/icdm.2019.00019

Taster: Self-Tuning, Elastic and Online Approximate Query Processing

Author(s): Matthaios Olma, Odysseas Papapetrou, Raja Appuswamy, Anastasia Ailamaki
Published in: 2019 IEEE 35th International Conference on Data Engineering (ICDE), 2019, Page(s) 482-493, ISBN 978-1-5386-7474-1
Publisher: IEEE
DOI: 10.1109/icde.2019.00050

GPU-accelerated data management under the test of time

Author(s): Aunn Raza; Periklis Chrysogelos; Panagiotis Sioulas; Vladimir Indjic; Angelos Christos Anadiotis; Anastasia Ailamaki
Published in: Conference on Innovative Data Systems Research (CIDR), 1, 2020
Publisher: www.cidrdb.org
DOI: 10.5281/zenodo.3827490

Similarity search over enriched geospatial data

Author(s): Kostas Patroumpas, Dimitrios Skoutas
Published in: Proceedings of the Sixth International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data, 2020, Page(s) 1-6, ISBN 9781-450380355
Publisher: ACM
DOI: 10.1145/3403896.3403967

JedAI3: beyond batch, blocking-based Entity Resolution

Author(s): George Papadakis; Leonidas Tsekouras; Manos Thanos; Nikiforos Pittaras; Giovanni Simonini; Dimitrios Skoutas; Paul Isaris; George Giannakopoulos; Themis Palpanas; Manolis Koubarakis
Published in: Proceedings of the 23rd International Conference on Extending Database Technology (EDBT), 2020, Page(s) 603-606
Publisher: OpenProceedings
DOI: 10.5441/002/edbt.2020.74

Adaptive HTAP through Elastic Resource Scheduling

Author(s): Aunn Raza, Periklis Chrysogelos, Angelos Christos Anadiotis, Anastasia Ailamaki
Published in: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 2020, Page(s) 2043-2054, ISBN 9781-450367356
Publisher: ACM
DOI: 10.1145/3318464.3389783

Discovering Mixture-Based Best Regions of Arbitrary Shapes

Author(s): Skoutas, Dimitrios, Sacharidis, Dimitris, & Patroumpas, Kostas
Published in: 29th International Conference on Advances in Geographic Information Systems (SIGSPATIAL), 2021, Page(s) 468-479
Publisher: ACM
DOI: 10.1145/3474717.3484215

A system design for elastically scaling transaction processing engines in virtualized servers

Author(s): Angelos-Christos G. Anadiotis, Raja Appuswamy, Anastasia Ailamaki, Ilan Bronshtein, Hillel Avni, David Dominguez-Sal, Shay Goikhman, Eliezer Levy
Published in: Proceedings of the VLDB Endowment, 2020, Page(s) 3085-3098
Publisher: VLDB Endowment
DOI: 10.14778/3415478.3415536

explAIner: A Visual Analytics Framework for Interactive and Explainable Machine Learning

Author(s): Thilo Spinner, Udo Schlegel, Hanna Schafer, Mennatallah El-Assady
Published in: IEEE Transactions on Visualization and Computer Graphics, 2019, Page(s) 1-1, ISSN 1077-2626
Publisher: Institute of Electrical and Electronics Engineers
DOI: 10.1109/TVCG.2019.2934629

Visual Exploration of Geolocated Time Series with Hybrid Indexing

Author(s): Georgios Chatzigeorgakidis, Kostas Patroumpas, Dimitrios Skoutas, Spiros Athanasiou, Spiros Skiadopoulos
Published in: Big Data Research, 15, 2019, Page(s) 12-28, ISSN 2214-5796
Publisher: Elsevier Inc.
DOI: 10.1016/j.bdr.2019.02.001

Uncertainty-Aware Principal Component Analysis

Author(s): Jochen Gortler, Thilo Spinner, Dirk Streeb, Daniel Weiskopf, Oliver Deussen
Published in: IEEE Transactions on Visualization and Computer Graphics, 26/1, 2020, Page(s) 822-831, ISSN 1077-2626
Publisher: Institute of Electrical and Electronics Engineers
DOI: 10.1109/tvcg.2019.2934812

Blocking and Filtering Techniques for Entity Resolution

Author(s): George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, Themis Palpanas
Published in: ACM Computing Surveys, 53/2, 2020, Page(s) 1-42, ISSN 0360-0300
Publisher: Association for Computing Machinary, Inc.
DOI: 10.1145/3377455

v‐plots: Designing Hybrid Charts for the Comparative Analysis of Data Distributions

Author(s): Michael Blumenschein, Luka J. Debbeler, Nadine C. Lages, Britta Renner, Daniel A. Keim, Mennatallah El‐Assady
Published in: Computer Graphics Forum, 39/3, 2020, Page(s) 565-577, ISSN 0167-7055
Publisher: Blackwell Publishing Inc.
DOI: 10.1111/cgf.14002

Datasets

GDELT Articles Graph

Author(s): Pantelis Chronis
Published in: Zenodo

OSM Businesses & Organizations

Author(s): Pantelis Chronis
Published in: Zenodo

CorpWatch Companies Graph

Author(s): Pantelis Chronis
Published in: Zenodo

DBLP Publications Network

Author(s): Chronis, Pantelis
Published in: Zenodo

A Parallel and Distributed Approach for Diversified Top-k Best Region Search

Author(s): Shahrivari, Hamid; Olma, Matthaios; Papapetrou, Odysseas; Skoutas, Dimitrios; Ailamaki, Anastasia
Published in: OpenProceedings.org

Wikidata Companies Graph

Author(s): Chronis, Pantelis
Published in: Zenodo

JedAI^3 : beyond batch, blocking-based Entity Resolution

Author(s): Papadakis, George; Tsekouras, Leonidas; Thanos, Emmanouil; Pittaras, Nikiforos; Simonini, Giovanni; Skoutas, Dimitrios; Isaris, Paul; Giannakopoulos, George; Palpanas, Themis; Koubarakis, Manolis
Published in: OpenProceedings.org