Skip to main content

Nonlinear Data and Signal Analysis with Diffusion Operators

Periodic Reporting for period 1 - DIFFOP (Nonlinear Data and Signal Analysis with Diffusion Operators)

Reporting period: 2019-02-01 to 2020-07-31

Nowadays, extensive collection and storage of massive data sets have become a routine in multiple disciplines and in everyday life. These large amounts of intricate data often make data samples arithmetic and basic comparisons problematic, raising new challenges to traditional data analysis objectives such as filtering and prediction. Furthermore, the availability of such data constantly pushes the boundaries of data analysis to new emerging domains, ranging from neuronal and social network analysis to multimodal sensor fusion. The combination of evolved data and new domains drives a fundamental change in the field of data analysis. Indeed, many classical model-based techniques have become obsolete since their models do not embody the richness of the collected data. Today, one notable avenue of research is the development of nonlinear techniques that transition from data to creating representations, without deriving models in closed-form. The vast majority of such existing data-driven methods operate directly on the data, a hard task by itself when the data are large and elaborated. The goal of this research is to develop a fundamentally new methodology for high dimensional data analysis with diffusion operators, making use of recent transformative results in manifold and geometry learning. More concretely, shifting the focus from processing the data samples themselves and considering instead structured data through the lens of diffusion operators introduce new powerful “handles” to data, capturing their complexity efficiently. We study the basic theory behind this nonlinear analysis, develop new operators for this purpose, and devise efficient data-driven algorithms. In addition, we explore how our approach can be leveraged for devising efficient solutions to a broad range of open real-world data analysis problems, involving intrinsic representations, sensor fusion, time-series analysis, network connectivity inference, and domain adaptation.
Multimodal data analysis
One of the long-standing challenges in signal processing and data analysis is the fusion of information acquired by multiple, multimodal sensors. The problem of information fusion has become particularly central in the wake of recent technological advances, which have led to extensive collection and storage of multimodal data. In our research, we address the problem from a manifold learning/geometric analysis standpoint.

In “O. Katz, R. R. Lederman, R. Talmon, Spectral Flow on the Manifold of SPD Matrices for Multimodal Data Processing”, we combine manifold learning with the Riemannian geometry of symmetric and positive-definite (SPD) matrices. In particular, we study the way the spectra of kernels change along geodesic paths on the manifold of SPD matrices. We show that this change enables us, in a purely unsupervised manner, to derive a compact, yet informative, description of the relations between the multimodal measurements. Based on this result, we present new algorithms for extracting the common latent components and for identifying common and measurement-specific components.

In addition, in “O. Yair, F. Dietrich, R. Mulayoff, R. Talmon, and I. G. Kevrekidis, Spectral Discovery of Jointly Smooth Features for Multimodal Data”, we present a spectral method for deriving functions that are jointly smooth on multiple observed manifolds. We demonstrate the proposed method on sleep stage identification, and we show how the proposed method can be leveraged for finding minimal realizations of parameter spaces of nonlinear dynamical systems.

The Riemannian geometry of the manifold of diffusion operators
We study the manifold of diffusion operators, on which we can define geometric, differential, and probabilistic structures. This research direction entails a fresh approach to manifold learning, departing from the traditional use of spectral decomposition of diffusion operators for embedding.

Often, diffusion operators are not strictly positive, but rather positive semi-definite. Unlike the manifold of SPD matrices, which has been studied for many years and have a solid theory, the manifold of symmetric positive semi-definite (SPSD) matrices lacks many key results. In “O. Yair, A. Lahav, and R. Talmon, Symmetric Positive Semi-definite Riemannian Geometry with Application to Domain Adaptation”, we present new results on the Riemannian geometry of SPSD matrices, including approximations of the logarithmic maps, the exponential map, and Parallel Transport (PT), as well as a canonical representation for a set of SPSD matrices. Based on these results, we propose an algorithm for Domain Adaptation (DA) and demonstrate its performance in two applications: fusion of hyper-spectral images and motion recognition.

In “Lustig, E., Yair, O., Talmon, R. and Segev, M., 2020. Identifying Topological Phase Transitions in Experiments Using Manifold Learning. Physical Review Letters, 125(12), p.127401”, we demonstrate the identification of topological phase transitions from experimental data using diffusion maps: a nonlocal unsupervised machine learning method. We analyze experimental data from an optical system undergoing a topological phase transition and demonstrate the ability of this approach to identify topological phase transitions even when the data originates from a small part of the system, and does not even include edge states.

In “Y.-W. E. Lin, T. Shnitzer, R. Talmon, F. Villarroel-Espindola, S. Desai, K. Schalper, and Y. Kluger, Graph of graphs analysis for multiplexed data with application to imaging mass cytometry”, we propose a two-step graph-based analyses for high-dimensional multiplexed datasets characterizing ROIs and their inter-relationships. We apply our method to imaging mass cytometry (IMC) and show that it leads to state of the art prediction of sensitivity to treatment of lung cancer patients.
The results described in detail above go beyond the state of the art in several aspects:
a. We developed new key results in the Riemannian geometry of symmetric and positive semi-definite (SPSD) matrices. These results facilitate a useful framework where SPSD matrices serve as data features.
b. We established new connections between manifold learning, spectral analysis and the Riemannian geometry of symmetric and positive-definite (SPD) matrices. We presented new theoretical results as well as new numerical schemes.
c. We devised new algorithms for domain adaptation and demonstrated state of the art results in several benchmarks.
d. We showed accurate identification of topological phase transitions in matters based on experimental data in a purely unsupervised, model-free fashion.
e. We showed accurate prediction of sensitivity to treatment of lung cancer patients from imaging mass cytometry.

These results serve as a convenient stepping-stone for the remainder of the project, allowing us to proceed as planned in three main research avenues:
a. Studying further the Riemannian geometry of SPD and SPSD matrices. This will give rise to a new mathematical toolbox and will make SPD and SPSD matrices powerful data features.
b. Exploring the interplay between this Riemannian geometry and new compositions of diffusion operators. We expect this to lead to a new class of manifold learning methods that support the analysis and learning of multiple manifolds.
c. Applying the new toolbox we develop to real-world problems.