Platform-aware LArge-scale Time-Series prOcessiNg

Informazioni relative al progetto

PLATON

ID dell’accordo di sovvenzione: 101031688

DOI

10.3030/101031688

Progetto chiuso

Data della firma CE 21 Aprile 2021

Data di avvio 1 Ottobre 2021

Data di completamento 30 Settembre 2022

Finanziato da

EXCELLENT SCIENCE - Marie Skłodowska-Curie Actions

Costo totale

€ 92 353,92

Contributo UE

€ 92 353,92

92 353,92

Coordinato da

UNIVERSITE PARIS CITE
France

Periodic Reporting for period 1 - PLATON (Platform-aware LArge-scale Time-Series prOcessiNg)

Periodo di rendicontazione: 2021-10-01 al 2022-09-30

PLATON brought together a highly-experienced researcher in the field of the theory of concurrent and distributed computing with a hosting group which has world-leading expertise on data series management, indexing, and analysis to harness the difficulties of large-scale data series processing by realizing the data series processing performance and scalability goals. PLATON built a powerful index for large-scale data series processing, which facilitates processing of datasets that are orders of magnitude larger than the current datasets tested by state-of-the-art such indexes. This holistic data series indexing solution is a novel distributed data-series processing framework that efficiently addresses the critical challenges of exhibiting good speedup and ensuring high scalability in data series processing by taking advantage of the full computational capacity of modern clusters comprised of multi-core servers. PLATON provided also fault-tolerant solutions by developing the first non-blocking index, which by avoiding the use of locks, it allows to all threads to make progress independently of other threads’ speeds or failures. Through a wide range of configurations and using several real and synthetic datasets, the experimental analysis performed in PLATON demonstrates that the designed software achieves all its challenging goals. This software has been used to process large-scale collections of real data series, which has several important applications across many domains, such as in seismology, astrophysics, neuroscience, and other scientific fields.

1 Overview of the Results
• PLATON developed a framework providing new algorithms and techniques for highly-efficient data series processing in a multi-node setting. This encompassed the design and implementation of low-cost, data-aware data partitioning and mapping techniques for answering queries on large collections of data series in heterogeneous computing platforms, as well as low cost load balancing and communication solutions for multi-node query processing that resulted in much better performance and high scalability in large-scale data series processing.
The new multi-node index facilitates processing of datasets whose size is at least an order of magnitude larger than the current datasets tested by state-of-the-art indexes. It is a holistic data series indexing solution, which utilized all the computational resources provided by heterogeneous computing platforms, achieving to be orders of magnitude faster and more scalable than current state-of-the-art approaches.
• PLATON developed a new multi-threading index and query processing scheme, called Fresh, for large data series collections which is highly fault-tolerant, at no cost when compared to existing parallel solutions which use locks, and thus they do not provide any fault-tolerance (i.e. they are blocking).
• PLATON achieved enhanced performance by combining the power of general-purpose CPUs with accelerators, such as Graphical Processing Units (GPUs). The main outcome in this direction is SING, a parallel index that ensures 5x better performance than state-of-the-art parallel solutions which utilize only multi-threading.

A detailed description of PLATON research outcomes can be found in PLATON technical deliverables (D4.1 D3.1 D3.2) and the related papers reported in them.

3 Dissemination
Press Releases: 2
Papers Submitted for Publication to Conferences: 1
Papers Submitted for Publication to Journals: 2
Papers in Conferences and Workshops: 5 (1 best paper award in top conference)
Papers in Journals: 1
Articles in Magazines: 2
Technical Reports: 8
Presentations in Conferences: 4
Posters: 2
Tutorials and Talks: 8
Talks to the public: 2
Participation in Conferences, Workshops and Summits: 8
See https://platon.mi.parisdescartes.fr/publications.html for details. See also deliverables of WP5 and WP6 of PLATON.

4 Exploitation
The results of PLATON have been integrated in various educational materials and seminars, used to inform students (at the level of BSc, MSc, PhD, postdocs), as well as researchers, on the state-of-the-art methods for scalable data series management and analytics. P. Fatourou has started investigating the path of commercializing the resulting technology. She has attended several entrepreneurship seminars and management training activities. She has acquired 2 new French grants and 1 grant in Greece.

See D1.2 for more details.

1. We described Odyssey, a novel distributed data-series processing framework that efficiently addresses the critical challenges of exhibiting good speedup and ensuring high scalability in data series processing by taking advantage of the full computational capacity of modern clusters comprised of multi-core servers. Odyssey addresses a number of challenges in designing efficient and highly-scalable distributed data series index, including efficient scheduling, and load-balancing without paying the prohibitive cost of moving data around. It also supports a flexible partial replication scheme, which enables Odyssey to navigate through a fundamental trade-off between data scalability and good performance during query answering. Through a wide range of configurations and using several real and synthetic datasets, our experimental analysis demonstrates that Odyssey achieves its challenging goals.

2. We developed FreSh, the first lock-free data series index that, surprisingly, exhibits the same performance as the state-of-the-art lock-based in-memory indexes. For developing FreSh, we studied in depth the design decisions of current state-of-the-art data series indexes, and the principles governing their performance. We distilled the knowledge we obtained to come up with a theoretical framework which enables the development and analysis of data series indexes in a modular way. Experiments, using several synthetic and real datasets illustrate that FreSh, albeit lock-free, achieves performance that is as good as that of the state-of-the-art blocking in-memory data series index.

3. We proposed SING, a data series index designed for CPU+GPU (Graphics Processing Unit) coprocessing. SING is an in-memory index that uses the GPU's parallelization opportunities and combines them with the power of SIMD and multi-core processing, in order to accelerate similarity search. We conducted an experimental evaluation with several synthetic and real datasets, which shows that SING is up to 5x faster than the state-of-the-art parallel in-memory approach.

Potential impacts

Data series are one of the most common types of data, and are present in virtually every scientific and social domain: they appear as audio sequences, shape and image data, financial, telecommunications, environmental monitoring and scientific data, and they have many diverse applications (in health care, earth sciences, astronomy, biology, economics, etc.). The software developed in PLATON can have high impact in all the above scientific domains and sectors.

The PLATON index has been designed in a modular manner that can be easily adjusted to run on top of heterogeneous High-Performance Computing (HPC) environments.
All software developed in the context of PLATON will be provided as open-access (after the publication of our research results). Appropriate documentation has been prepared to provide directions on how to use the PLATON software.

website.jpg

Periodic Reporting for period 1 - PLATON (Platform-aware LArge-scale Time-Series prOcessiNg)

Scarica Scarica il contenuto della pagina