Skip to main content

Sustainable Data Lakes for Extreme-Scale Analytics

Objective

Data lakes are raw data ecosystems, where large amounts of diverse data are retained and coexist. They facilitate self-service analytics for flexible, fast, ad hoc decision making. SmartDataLake enables extreme-scale analytics over sustainable big data lakes. It provides an adaptive, scalable and elastic data lake management system that offers: (a) data virtualization for abstracting and optimizing access and queries over heterogeneous data, (b) data synopses for approximate query answering and analytics to enable interactive response times, and (c) automated placement of data in different storage tiers based on data characteristics and access patterns to reduce costs. The data lake’s contents are modelled and organised as a heterogeneous information network, containing multiple types of entities and relations. Efficient and scalable algorithms are provided for: (a) similarity search and exploration for discovering relevant information, (b) entity resolution and ranking for identifying and selecting important and representative entities across sources, (c) link prediction and clustering for unveiling hidden associations and patterns among entities, and (d) change detection and incremental update of analysis results to enable faster analysis of new data. Finally, interactive and scalable visual analytics are provided to include and empower the data scientist in the knowledge extraction loop. This includes functionalities for: (a) visually exploring and tuning the space of features, models and parameters, and (b) enabling large-scale visualizations of spatial, temporal and network data. The results of the project are evaluated in real-world use cases from the business intelligence domain, including scenarios for portfolio recommendation, production planning and pricing, and investment decision making. SmartDataLake will foster innovation and enable European SMEs to capitalize on the value of their own data lakes.

Field of science

  • /social sciences/economics and business/economics/sustainable economy
  • /natural sciences/computer and information sciences/data science/big data
  • /social sciences/economics and business/economics/production economics
  • /natural sciences/computer and information sciences/data science/business intelligence

Call for proposal

H2020-ICT-2018-2
See other projects for this call

Funding Scheme

RIA - Research and Innovation action

Coordinator

ATHINA-EREVNITIKO KENTRO KAINOTOMIAS STIS TECHNOLOGIES TIS PLIROFORIAS, TON EPIKOINONION KAI TIS GNOSIS
Address
Artemidos 6 Kai Epidavrou
151 25 Maroussi
Greece
Activity type
Research Organisations
EU contribution
€ 853 125

Participants (7)

ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE
Switzerland
EU contribution
€ 760 000
Address
Batiment Ce 3316 Station 1
1015 Lausanne
Activity type
Higher or Secondary Education Establishments
TECHNISCHE UNIVERSITEIT EINDHOVEN
Netherlands
EU contribution
€ 569 637,50
Address
Groene Loper 3
5612 AE Eindhoven
Activity type
Higher or Secondary Education Establishments
UNIVERSITAT KONSTANZ
Germany
EU contribution
€ 425 000
Address
Universitatsstrasse 10
78464 Konstanz
Activity type
Higher or Secondary Education Establishments
RAW LABS SA
Switzerland
EU contribution
€ 411 750
Address
Epfl Innovation Park, Batiment 1
1015 Lausanne
Activity type
Private for-profit entities (excluding Higher or Secondary Education Establishments)
SPAZIODATI SRL
Italy
EU contribution
€ 302 500
Address
Via Adriano Olivetti 13
38122 Trento
Activity type
Private for-profit entities (excluding Higher or Secondary Education Establishments)
SPRING TECHNO GMBH & CO KG
Germany
EU contribution
€ 300 937,50
Address
Hermann Kohlstrasse 7
28199 Bremen
Activity type
Private for-profit entities (excluding Higher or Secondary Education Establishments)
SYNYO GmbH
Austria
EU contribution
€ 322 500
Address
Otto-bauer-gasse 5/14
1060 Wien
Activity type
Private for-profit entities (excluding Higher or Secondary Education Establishments)