Periodic Reporting for period 1 - PLATON (Platform-aware LArge-scale Time-Series prOcessiNg)
Periodo di rendicontazione: 2021-10-01 al 2022-09-30
• PLATON developed a framework providing new algorithms and techniques for highly-efficient data series processing in a multi-node setting. This encompassed the design and implementation of low-cost, data-aware data partitioning and mapping techniques for answering queries on large collections of data series in heterogeneous computing platforms, as well as low cost load balancing and communication solutions for multi-node query processing that resulted in much better performance and high scalability in large-scale data series processing.
The new multi-node index facilitates processing of datasets whose size is at least an order of magnitude larger than the current datasets tested by state-of-the-art indexes. It is a holistic data series indexing solution, which utilized all the computational resources provided by heterogeneous computing platforms, achieving to be orders of magnitude faster and more scalable than current state-of-the-art approaches.
• PLATON developed a new multi-threading index and query processing scheme, called Fresh, for large data series collections which is highly fault-tolerant, at no cost when compared to existing parallel solutions which use locks, and thus they do not provide any fault-tolerance (i.e. they are blocking).
• PLATON achieved enhanced performance by combining the power of general-purpose CPUs with accelerators, such as Graphical Processing Units (GPUs). The main outcome in this direction is SING, a parallel index that ensures 5x better performance than state-of-the-art parallel solutions which utilize only multi-threading.
A detailed description of PLATON research outcomes can be found in PLATON technical deliverables (D4.1 D3.1 D3.2) and the related papers reported in them.
3 Dissemination
Press Releases: 2
Papers Submitted for Publication to Conferences: 1
Papers Submitted for Publication to Journals: 2
Papers in Conferences and Workshops: 5 (1 best paper award in top conference)
Papers in Journals: 1
Articles in Magazines: 2
Technical Reports: 8
Presentations in Conferences: 4
Posters: 2
Tutorials and Talks: 8
Talks to the public: 2
Participation in Conferences, Workshops and Summits: 8
See https://platon.mi.parisdescartes.fr/publications.html(si apre in una nuova finestra) for details. See also deliverables of WP5 and WP6 of PLATON.
4 Exploitation
The results of PLATON have been integrated in various educational materials and seminars, used to inform students (at the level of BSc, MSc, PhD, postdocs), as well as researchers, on the state-of-the-art methods for scalable data series management and analytics. P. Fatourou has started investigating the path of commercializing the resulting technology. She has attended several entrepreneurship seminars and management training activities. She has acquired 2 new French grants and 1 grant in Greece.
See D1.2 for more details.
2. We developed FreSh, the first lock-free data series index that, surprisingly, exhibits the same performance as the state-of-the-art lock-based in-memory indexes. For developing FreSh, we studied in depth the design decisions of current state-of-the-art data series indexes, and the principles governing their performance. We distilled the knowledge we obtained to come up with a theoretical framework which enables the development and analysis of data series indexes in a modular way. Experiments, using several synthetic and real datasets illustrate that FreSh, albeit lock-free, achieves performance that is as good as that of the state-of-the-art blocking in-memory data series index.
3. We proposed SING, a data series index designed for CPU+GPU (Graphics Processing Unit) coprocessing. SING is an in-memory index that uses the GPU's parallelization opportunities and combines them with the power of SIMD and multi-core processing, in order to accelerate similarity search. We conducted an experimental evaluation with several synthetic and real datasets, which shows that SING is up to 5x faster than the state-of-the-art parallel in-memory approach.
Potential impacts
Data series are one of the most common types of data, and are present in virtually every scientific and social domain: they appear as audio sequences, shape and image data, financial, telecommunications, environmental monitoring and scientific data, and they have many diverse applications (in health care, earth sciences, astronomy, biology, economics, etc.). The software developed in PLATON can have high impact in all the above scientific domains and sectors.
The PLATON index has been designed in a modular manner that can be easily adjusted to run on top of heterogeneous High-Performance Computing (HPC) environments.
All software developed in the context of PLATON will be provided as open-access (after the publication of our research results). Appropriate documentation has been prepared to provide directions on how to use the PLATON software.