Skip to main content

Hybrid Learning Systems utilizing Sum-Product Networks

Periodic Reporting for period 1 - HYBSPN (Hybrid Learning Systems utilizing Sum-Product Networks)

Reporting period: 2018-03-01 to 2019-12-31

"Within Computer Science, Machine Learning (ML) and Artificial Intelligence (AI) are certainly among the most disruptive disciplines of the 21st century. Traditionally, the main focus of ML has been the so-called discriminative approach, which means that the goal is to predict outputs (e.g. a class label, 'face' vs 'no-face') from inputs (e.g. an image). Opposed to the discriminative approach is the so-called generative approach, which rather aims to capture the underlying data-generating process (e.g. how pixels in an image are related to each other, and the class label). The generative approach promises to overcome several key challenges of nowadays AI systems, such as ameliorating ""catastrophic forgetting"" (the phenomenon that ML models completely forget previously learned tasks, when trained on new tasks), overconfidence (predictions with highly exaggerated confidence), and noise and outlier robustness. Furthermore, generative approaches enable many techniques to improve human trust in AI systems, in particular techniques to improve interpretability, explainability, and fairness.

In this project, we addressed an important fundamental problem in generative modeling, namely the notorious hardness of inference. Formally, any generative model strives to capture the true data-generating probability distribution, which allows us to rigorously represent data-dependencies and uncertainty in a universal, unifying, and consistent framework: probability theory. Furthermore, probability theory provides us with tools to derive new insights from our models, to reason under uncertainty, and to derive optimal decisions. These tool, generally referred to as probabilistic inference, are formal and well-defined mathematical operations, which are amenable to automation. Unfortunately, however, most of these inference routines are NP-hard for most generative models (this means that most probably these problems cannot be solved efficiently using current computers).

The remedy for this dilemma, taken in this project, are so-called tractable models, i.e. a class of generative models, where inference can be done exactly and efficiently. One of the most prominent type of tractable model are sum-product networks, a special type of artificial neural network. There exists, however, a natural tension between tractability and expressiveness: when restricting the model class to tractable models, we naturally lose representational power, which means that a tractable model might not capture the data-generating process as faithfully as unrestricted models. The general picture before this project was that a practitioner had a forced choice: i) use unrestricted models and accept the downsides of approximate inference, or ii) use tractable models and accept the downside of restricted model power. Approaches which combined these complementary advantages were scarce at best.

In this project, we explored hybrid approaches, combining tractable models, in particular SPNs, with other approaches from the ML toolbox. The main objective of the project was to explore strategies to combine SPNs with other ML techniques in a meaningful manner, and to demonstrate the benefit of these hybrid learning systems, by establishing new state-the-art results on several ML/AI tasks."
In this project, we explored multiple approaches to construct hybrid learning systems involving SPNs. The most important approaches are briefly sketched in the following.

1. Randomized SPN Structures [appeared at UAI’19]
SPNs can be seen as special kind of neural networks, with sparse and cluttered structures. This imposes certain difficulties to work with SPNs in practice, and complicates their implementation on modern machine learning frameworks. In this work, we demonstrated that SPNs can be well scaled by leveraging randomized (yet valid) structures, and that random SPNs deliver faithful generative models, and are competitive to deep neural networks.

2. Combining SPNs and Variational Autoencoders [appeared at ICML’19]
The Variational Autoencoder (VAE) model is related to SPNs, and is a flexible generative model based on deep neural networks. However, while they are flexible models, inference in VAEs is intractable and needs to be approximated. In this work, we combined the two model classes in a natural way, by defining SPNs over local VAE modules. We demonstrated that exact SPN inference and approximate VAE inference can be naturally combined, yielding a novel principle of hybrid inference models. In experiments, our proposed model consistently outperformed both pure SPNs and pure VAEs.
This hybrid generative model is illustrated in the attached image: The overall structure is an SPN model,which probabilistically selects a combination of VAE experts, each of which generates part of the modeled data.

3. Bayesian Computer Vision using SPNs [appeared at ICML’19]
Computer Vision (CV) and Computer Graphics (CG) can be seen as inverse processes: while CG starts from a scene description and produces a synthetic image as output, CV starts from an image and aims to retrieve some meaningful scene description. Therefore, CV can be done by performing Bayesian inference over a CG model. In this work, we demonstrated that this technique benefits by leveraging tractable models, in particular SPNs. The main innovation of this approach is an elegant and sound combination of approximate and exact inference techniques, which has not been explored before. In our experiments, we demonstrate that our system converges much faster than previously proposed models, and produces more stable inference results.

4. Bayesian SPNs [appeared at NeurIPS’19]
Learning the network structure is an important topic in SPNs, but has so far been treated in a rather ad-hoc manner. In this paper, we proposed a well principled Bayesian approach to learning both structure and parameters of SPNs. Due to the Bayesian treatment, our learning algorithm protects naturally against the so-called overfitting phenomenon (i.e. overly adapting the model to training data).

5. Combining SPNs and Gaussian Processed [to appear at AISTATS’20]
In this paper, we combined SPNs and Gaussian Processes (GPs) to a novel Bayesian regression model with tractable inference. When used as GP approximation, our model delivered a better model fit than previous approximations, while having the same running time.

Besides these core contributions, this project has yielded several publications and pre-prints in the areas of tractable models and hard-ware efficient machine learning -- see full publication list.
By exploring novel hybrid models, lying in between the fully tractable and the intractable regime, this project has broken new ground in an exciting, challenging and important topic in ML and AI. In our publications we demonstrated that combining tractable models like SPNs with other – often intractable – models is sensible and can improve state of the art. Future work will systematically explore these new insights and develop a novel theory on hybrid inference. The goal will be a novel framework of hybrid generative models with certifiable inference, in order to improve robustness and trust in AI systems.