Periodic Reporting for period 3 - TORI (In-situ Topological Reduction of Scientific 3D Data)
Période du rapport: 2023-10-01 au 2025-03-31
TORI researches, designs, implements and distributes new algorithms for the interactive analysis of collections of large-scale datasets, based on concise representations of the data, derived from mathematical topology.
These concise representations greatly reduce the memory footprint required for describing a dataset. At the same time, they robustly capture the intrinsic structural patterns hidden in the data. This allows for the efficient measurements and comparisons of large-scale datasets based on their concise topological descriptors.
The results accomplished by the TORI research project will allow the advanced analysis of collections of datasets of unprecedented size, in many fields of science, either based on acquisition devices or simulation processes.
To achieve these goals, TORI focuses on two aspects. First, it targets computability by designing new topological analysis algorithms capable of processing datasets of unprecedented size. This is achieved by re-designing from scratch existing algorithms, to improve their performance by making them compatible with high-performance hardware (shared-memory and distributed parallelisms). Second, TORI introduces novel statistical tools exploiting reduced topological descriptors for the analysis of collections of large-scale datasets. These tools allow for the analysis of the main trends in collections of datasets as well as detailed variability analyses.
Specifically, TORI introduced the first parallel algorithm for data pre-simplification, which paves the way for the interactive multi-scale analysis of large-scale data. Recently, TORI contributed a new algorithm for the high-performance computation of the Morse-Smale segmentation, an advanced analysis capability popular in many fields of science. TORI also contributed a parallel algorithm for the high-performance computation of a topological descriptor called the "Persistence Diagram", which is a central object in Topological Data Analysis. This contribution comes with an extensive performance benchmark which demonstrates clear performance gains over pre-existing approaches. TORI also explored efficient algorithms for the approximation of persistence diagrams with theoretical guarantees.
In terms of collection analysis, TORI contributed several key results.
This includes topology-driven approaches for dimensionality reduction. TORI also developed a complete framework for the statistical analysis of a collection of datasets, based on their representation by an advanced topological descriptor called the "Merge Tree".
Specifically, TORI introduced an efficient metric of acceptable practical stability to compare these objects. This is a foundational result which allowed the development of advanced algorithms for the computation of geodesics, Frechet means or more recently principal geodesic analysis. These tools enable global analysis capabilities, for trend and variability analysis with applications to lossy compression and dimensionality reduction.
In collaboration with domain experts, TORI applied the above contributions to several fields of science, such as quantum chemistry or fluid mechanics, for the description of subtle structural patterns as well as their comparisons in large collections of datasets.
All the algorithms developed by TORI have been integrated within the open-source library "the Topology ToolKit" (TTK, https://topology-tool-kit.github.io/(s’ouvre dans une nouvelle fenêtre)) which is a leading package for Topological Data Analysis. Mini-symposia have been organized at the top visual analysis conference (IEEE VIS) to disseminate these results. Online tutorials have been produced (https://topology-tool-kit.github.io/examples/(s’ouvre dans une nouvelle fenêtre)) to reproduce the data analysis examples provided in TORI's publications.
Specifically, TORI will focus on the adaptation of key topological algorithms to a distributed memory setting, which is a necessary step to address datasets of unprecedented size (e.g. several terabytes per dataset). Moreover, TORI will continue the development of novel analysis tools for collections of topological descriptors, with applications to dimensionality reduction. Finally, the last axis of the project will deal with the extension of TORI's results towards the integration of time-varying topological descriptors, which will allow for radically novel analysis capabilities.