Final Report Summary - SUBSPARSE (Geometric and combinatorial foundations for emerging information and inference systems)
We have managed to closely carry out our research agenda as described in the proposal with rather important implications in data sciences and computation.
To provide a summary overview of our results, we will use the original proposal outline below and highlight selected results to the individual project objectives. Our full set of publications can be found at http://lions.epfl.ch/publications
1. Combinatorial approaches for sparse representation
It is fair to state that we have simultaneously succeeded on all tasks for this objective, which aim to combine the strengths of both dictionary design and dictionary-learning approaches towards optimal sparse representations, which enable provable information extraction from dimensionality reduced data.
Indeed, we have successfully derived and tested our submodular dictionary selection idea. As a result, we have provided the first rigorous results for dictionary selection from a given candidate set. While the existing literature considers continuous dictionary learning approaches, the discrete nature of our work enables us to rapidly adapt the continuous solutions to new scenarios. The results were published at premiere machine learning conference ICML 2011, and the follow up results are published at Journal of Selected Topics in Signal Processing 2012.
To establish the new theoretical framework, we unify two fundamentally different concepts: sparsity and submodularity. During the course of our project, the concept of submodularity (i.e. a discrete equivalent of convexity) has made a remarkable impact on sparse representations (cf., the work by Francis Bach on structure inducing submodular norms). Our insights from the dictionary selection work lead to our recent paper (http://arxiv.org/pdf/1411.1990.pdf) which is under review that goes beyond submodularity. In this work, we have observed that many interesting structured sparsity models can be naturally represented by linear matrix inequalities on the support of the unknown parameters, where the constraint matrix has a totally unimodular structure. For such structured models, tight convex relaxations can be obtained in polynomial time via linear programming. Our modeling framework unifies the prevalent structured sparsity norms in the literature, introduces new interesting ones, and renders their tightness and tractability arguments transparent. A video presentation is (or will be soon) available at 2014 NIPS Workshop on Discrete and Combinatorial Problems in Machine Learning.
2. Bayesian experimental design based on submodularity and sparsity
During our project, we have developed provable learning methods for Bayesian experimental design based on submodularity and a natural generalization of sparsity for matrices (i.e. low-rankness). We have also generalized the dictionary learning problem into a non-linear function learning problem which seem to have great relevance to the recent advances in deep learning.
Many Bayesian experimental design problems require optimizing unknown functions defined over a high-dimensional space from noisy samples that are expensive to obtain. We address this notoriously hard challenge, under the assumptions that the function varies only along some low-dimensional subspace and is smooth (i.e. it has a low norm in a Reproducible Kernel Hilbert Space). In particular, we present the SI-BO algorithm, which leverages recent low-rank matrix recovery techniques to learn the underlying subspace of the unknown function and applies Gaussian Process Upper Confidence sampling for optimization of the function. We carefully calibrate a submodular exploration–exploitation tradeoff by allocating the sampling budget to subspace estimation and function optimization, and obtain the first subexponential cumulative regret bounds and convergence rates for Bayesian optimization in high-dimensions under noisy observations. This result is published at the premiere machine learning conference NIPS in 2013.
The result above relies on our generalization of dictionary learning to "function learning" problem via low-rank models, which was published at NIPS in 2012 as well as a journal version at the Applied Computational and Harmonics Analysis (ACHA) journal in 2014. Our theoretical developments leverage recent techniques from low rank matrix recovery, which enables us to derive an estimator of the function along with sample complexity bounds. We also characterize the noise robustness of the scheme, and provide empirical evidence that the high-dimensional scaling of our sample complexity bounds is quite accurate.
We have also hit homeruns in some of the subtopics of this project. For instance, on the concept of compressible priors that we introduced at NIPS 2009, we have published a Transactions on Information Theory paper, which was published in 2012. This paper provides comprehensive theory on the compressibility and incompressibility of independent and identical realizations of probability distributions. Regarding the application of the expectation propagation algorithm topic, we have made a connection with the approximate message passing framework and have introduced a new algorithm that applies to virtually all low-dimensional modeling problems, from dictionary learning to matrix completion, which resulted in two publications at the Transactions on Signal Processing 2014.
3. Sparse representations in structured submodularity
In this objective, we investigated the role of sparsity in submodular optimization, which has key relevance to discrete optimization problems.
Our MLSP 2014 work considers a Bayesian approach where the signal structure can be represented by a mixture model with a submodular prior. Our CAMSAP 2013 work demonstrate that convexifications of submodular set functions can fundamentally change the natural of submodular regularization based on the Ising model. Surprisingly, the analysis of our sparsity framework with sparse matrices addressed an open algorithmic problem in theoretical computer science, published at the premiere discrete algorithms conference SODA 2014, which relates to finding polynomial-time algorithms for structured sparse recovery with expander matrices. See further http://sublinear.info/index.php?title=Open_Problems:50
Also see:
http://lions.epfl.ch/talks
http://lions.epfl.ch/mathematics_of_data
To provide a summary overview of our results, we will use the original proposal outline below and highlight selected results to the individual project objectives. Our full set of publications can be found at http://lions.epfl.ch/publications
1. Combinatorial approaches for sparse representation
It is fair to state that we have simultaneously succeeded on all tasks for this objective, which aim to combine the strengths of both dictionary design and dictionary-learning approaches towards optimal sparse representations, which enable provable information extraction from dimensionality reduced data.
Indeed, we have successfully derived and tested our submodular dictionary selection idea. As a result, we have provided the first rigorous results for dictionary selection from a given candidate set. While the existing literature considers continuous dictionary learning approaches, the discrete nature of our work enables us to rapidly adapt the continuous solutions to new scenarios. The results were published at premiere machine learning conference ICML 2011, and the follow up results are published at Journal of Selected Topics in Signal Processing 2012.
To establish the new theoretical framework, we unify two fundamentally different concepts: sparsity and submodularity. During the course of our project, the concept of submodularity (i.e. a discrete equivalent of convexity) has made a remarkable impact on sparse representations (cf., the work by Francis Bach on structure inducing submodular norms). Our insights from the dictionary selection work lead to our recent paper (http://arxiv.org/pdf/1411.1990.pdf) which is under review that goes beyond submodularity. In this work, we have observed that many interesting structured sparsity models can be naturally represented by linear matrix inequalities on the support of the unknown parameters, where the constraint matrix has a totally unimodular structure. For such structured models, tight convex relaxations can be obtained in polynomial time via linear programming. Our modeling framework unifies the prevalent structured sparsity norms in the literature, introduces new interesting ones, and renders their tightness and tractability arguments transparent. A video presentation is (or will be soon) available at 2014 NIPS Workshop on Discrete and Combinatorial Problems in Machine Learning.
2. Bayesian experimental design based on submodularity and sparsity
During our project, we have developed provable learning methods for Bayesian experimental design based on submodularity and a natural generalization of sparsity for matrices (i.e. low-rankness). We have also generalized the dictionary learning problem into a non-linear function learning problem which seem to have great relevance to the recent advances in deep learning.
Many Bayesian experimental design problems require optimizing unknown functions defined over a high-dimensional space from noisy samples that are expensive to obtain. We address this notoriously hard challenge, under the assumptions that the function varies only along some low-dimensional subspace and is smooth (i.e. it has a low norm in a Reproducible Kernel Hilbert Space). In particular, we present the SI-BO algorithm, which leverages recent low-rank matrix recovery techniques to learn the underlying subspace of the unknown function and applies Gaussian Process Upper Confidence sampling for optimization of the function. We carefully calibrate a submodular exploration–exploitation tradeoff by allocating the sampling budget to subspace estimation and function optimization, and obtain the first subexponential cumulative regret bounds and convergence rates for Bayesian optimization in high-dimensions under noisy observations. This result is published at the premiere machine learning conference NIPS in 2013.
The result above relies on our generalization of dictionary learning to "function learning" problem via low-rank models, which was published at NIPS in 2012 as well as a journal version at the Applied Computational and Harmonics Analysis (ACHA) journal in 2014. Our theoretical developments leverage recent techniques from low rank matrix recovery, which enables us to derive an estimator of the function along with sample complexity bounds. We also characterize the noise robustness of the scheme, and provide empirical evidence that the high-dimensional scaling of our sample complexity bounds is quite accurate.
We have also hit homeruns in some of the subtopics of this project. For instance, on the concept of compressible priors that we introduced at NIPS 2009, we have published a Transactions on Information Theory paper, which was published in 2012. This paper provides comprehensive theory on the compressibility and incompressibility of independent and identical realizations of probability distributions. Regarding the application of the expectation propagation algorithm topic, we have made a connection with the approximate message passing framework and have introduced a new algorithm that applies to virtually all low-dimensional modeling problems, from dictionary learning to matrix completion, which resulted in two publications at the Transactions on Signal Processing 2014.
3. Sparse representations in structured submodularity
In this objective, we investigated the role of sparsity in submodular optimization, which has key relevance to discrete optimization problems.
Our MLSP 2014 work considers a Bayesian approach where the signal structure can be represented by a mixture model with a submodular prior. Our CAMSAP 2013 work demonstrate that convexifications of submodular set functions can fundamentally change the natural of submodular regularization based on the Ising model. Surprisingly, the analysis of our sparsity framework with sparse matrices addressed an open algorithmic problem in theoretical computer science, published at the premiere discrete algorithms conference SODA 2014, which relates to finding polynomial-time algorithms for structured sparse recovery with expander matrices. See further http://sublinear.info/index.php?title=Open_Problems:50
Also see:
http://lions.epfl.ch/talks
http://lions.epfl.ch/mathematics_of_data