Periodic Reporting for period 4 - FACTORY (New paradigms for latent factor estimation)
Période du rapport: 2021-03-01 au 2022-08-31
Analysing & processing such data often entails unveiling latent structures in the forms of "patterns" that explain the data. In the above examples, these patterns may tell about 1) the musical preference of subgroups of users, 2) topics addressed in specific documents, 3) individual notes or chords played in the music signal. These patterns can be discovered by computing an approximate decomposition of the matrix data into the product of two smaller matrices, so-called "factors". The first factor yields the recurring patterns characteristic of the data. The second factor describes in which proportions each data sample is made of these patterns. Computing this decomposition (or "factorisation") involves a mathematical procedure called optimisation. It consists of designing an algorithm whose objective is to minimise a numerical measure of fit between the collected data and its factorised approximation.
In NMF, the matrix data is nonnegative and the latent factors are constrained to be nonnegative as well. NMF is typically cast as an optimisation problem in which a measure of fit between the data and its approximation is minimised with respect to the latent factors. A common measure of fit is the beta-divergence, a general measure of fit that takes important special cases such as the quadratic loss and the Kullback-Leibler or Itakura-Saito divergences. We developed new algorithms (using the framework of majorisation-minimisation) for NMF with the beta-divergence and several variants (convolutive NMF, sparse NMF, archetypal NMF, multi-resolution NMF). In particular, we designed algorithms for applications of NMF in imaging settings, namely in dynamical PET imaging and remote sensing where the goal is to "unmix" the contributions of latent sources/components that explain the data. Regarding PET imaging in particular, we developed a generic algorithm for a large class of measures of fit that can better describe the acquisition model, resulting in more accurate identification of pathological areas in a neuro-imaging application.
• Transform learning for matrix factorisation
In many settings, the matrix data is a collection of features computed from raw data. For example, in signal processing, the spectrogram is a time-frequency transform of a temporal sequence. In text analysis, a so-called tf-idf transform is usually applied to the raw word counts in order to homogenise the data coefficients. The role of these transform is to produce salient features of the raw data, that in particular are amenable to factorisation (i.e. the factorisation of these features makes sense). These features are computed with off-the-shelf transforms that may set a limit to performance. We proposed a new paradigm in which an optimal factorising transform is learnt together with the factors in a single step. This led to very good results in audio signal processing settings, where we showed that the usual Fourier transform can be efficiently replaced by an adaptive short-term transform.
• Temporal matrix factorisation
Sometimes the data samples are strongly correlated. This occurs for example when dealing with spectrograms: the short-term spectra being computed over short periods of time, two adjacent spectra will "look alike". This correlation should be taken into account in the factorisation to produce more accurate decompositions. We studied new probabilistic temporal models that smoothes the activation coefficients of the individual patterns to take this correlation into account.
• Binary and integer-valued matrix factorisation
Depending on the setting, the coefficients of the matrix data can be either continuous (real or complex-valued) or discrete (integer-valued, 1/0). The latter case concerns for example count data (song play-counts, word occurrences) or binary data (song played/unplayed, feature absent/present). It has been somehow less studied than the former and we developed and studied matrix factorisation techniques for binary and integer-valued data. In particular, we proposed new probabilistic models that account for the "over-dispersion" that is characteristic of some datasets, such as song play counts (some users are heavier listeners than others, some songs are much more popular than others). Regarding binary matrix factorisation, we proposed & studied probabilistic models that improve the interpretability of the estimated factors (using so-called "mean-parametrisation") as compared to more traditional approaches (that rely to a "link function"). We also proposed a NMF-based variant of the Bradley-Terry model for ranking of tennis players based pairwise-comparison match data.
• Multimodal data processing with joint matrix factorisation
Sometimes data is available in several ``modalities''. If you take songs for example, you may have access to the audio together with the text lyrics. If these songs are part of a music streaming service, they may be available with ratings or play counts. All these sources share mutual information; lyrics are correlated with the type of music and songs may tell you about the people who listen to them (and people who listen to same songs are likely to get along). One way to model the information between the various data modalities, represented in matrix form, is to consider co-factorisation: the shared information is modelled by shared factors. We used this paradigm to improve song recommandation scores (by coupling play counts with music tags or acoustic descriptors) and for audio-guided visual stream synthesis part of an art-science project.
• Other topics
In the second half of the project, we also explored topics related to matrix factorisation such as safe screening (acceleration of sparse linear regression algorithms), phase reconstruction (signal reconstruction from phase-less spectrograms), optimisation for training of deep neural networks, and optimal transport.