Skip to main content

High-level Prior Models for Computer Vision

Periodic Reporting for period 3 - HOMOVIS (High-level Prior Models for Computer Vision)

Reporting period: 2018-06-01 to 2019-11-30

"In recent years, computer vision as a technology has found its way into smart phones, autonomous cars, medical image processing, industrial inspection and surveillance tasks. Due to the every increasing amount of available visual data, it is expected that computer vision will play an even more important role for the society. The space of ""natural"" images is extremely large. For example considering the space of images of only 64x64 pixels and 256 different gray values, one can generate already 10 to the power of 10000 different images, a number which is by far larger than the (estimated) number of atoms in observable the universe. In this space, natural (or plausible) images are lying only on a certain thin manifold with a much smaller dimension. However, the structure of the manifold turns out to be extremely complicated, because it has to reflect many different transformations of the images such as translations, rotations, brightness changes etc.

A major aim of computer vision and image processing is to develop mathematical models to describe these manifolds, for example for image reconstruction, image classification, stereo etc. While most existing mathematical models are limited to local pixel interactions and hence only have a sense of edges, the aim of the research project is to develop high-level prior models that should eventually be able to describe the manifolds of objects ""faces"" of ""dogs"" or ""cars"". Natural images have (among others) two important invariances: They are invariant with respect to translations (objects can occur at different positions in images) and they are invariant with respect to rotations (objects might occur at different rotations). These two invariances are one of the main concepts of the variational models proposed in this project.

Inspired by the findings of the structure of the visual cortex by the Nobel prize winners Hubel and Wiesel, we propose in this project to represent images in the so-called roto-translation space, which decomposes the local image gradient into its magnitude and its orientation. The roto-translation space serves as a (simplified) mathematical model to the pattern of organization of the cells in the visual cortex. A major advantage of the representation in the roto-translation space is that one can easily get a sense of curvature and hence continuity of object boundaries which is known to be a very strongly prior of the human visual system. The ultimate goal of the project is to develop high-level prior models in the roto-translation space that can go beyond the notion of continuity of object boundaries and hence provide a better understanding of the structure of natural images.

"Here, we give a short chronological list of works performed since the start of the project:

* Together, with A. Chambolle, we have been invited to write a review paper in the Acta Numerica Journal, which was a great honor for us. The paper contains a broad overview of continuous optimization approaches for image processing and computer vision. The paper was published in the beginning of 2016.

* In the paper ""Total Variation on a Tree"", we have worked on efficient solvers of total variation minimization. The idea is to combine fast non-iterative solvers based on dynamic programming on tree-like graphs (e.g. chains) with continuous first-order primal-dual algorithms. The resulting algorithms appear to be extremely efficient and can be applied to a variety of convex and non-convex total-variation based imaging problems.

* In the paper ""Inertial Proximal Alternating Linearized Minimization (iPALM) for Nonconvex and Nonsmooth Problems"", we have developed an inertial algorithm for non-smooth and non-convex optimization. The algorithm is particularly suited for dictionary learning problems, hence we expect that the algorithm will also become important when learning high-level prior models in the roto-translation space.

* In the paper ""Acceleration of PDHG on partially strongly convex functions"", we have developed a principle to accelerated first-order primal-dual algorithms for partially strongly convex functions. The work provides a theoretical understanding of the fact that existing primal-dual algorithms on partially strongly convex functions often perform much better compared to their theoretical worst-case complexity.

* We have proposed in the paper ""Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration"" to learn the parameters of a highly parametrized reaction-diffusion equation for solving inverse problems in imaging. At the core of the method we perform simple gradient descent steps of a variational energy which is allowed to change from iteration to iteration. The resulting learned diffusion scheme is very flexible, highly efficient and achieves state-of-the-art performance on a number of image processing problems. In subsequent work ""Variational networks: Connecting variational methods and deep learning"", we have generalized the notion of the trainable diffusion equations towards variational networks, which can be seen as a form of an incremental proximal gradient descent on a variational energy. This allows to connect variational networks with modern CNN architectures such as residual networks. In a more application-driven follow up paper entitled ""Learning a variational network for reconstruction of accelerated MRI data"", we have used the reaction-diffusion equations for learning to reconstruct MRI images from undersampled MRI data. It turns out that our learned reaction-diffusion equations lead to significantly better reconstruction results while keeping the computational complexity at a minimum.

* In the paper ""End-to-end training of hybrid CNN-CRF models for stereo"" we have proposed an end-to-end learning framework for stereo computation. The main idea is to learn convolutional neural network (CNN) features for image matching which in turn serve as unary weights in a conditional random field (CRF) model. The difficulty of the approach is to propagate the gradients of the loss function through the CRF model, which is realized by a method similar to the structured output support vector machine (SSVM). The learned model is efficient, allows interpretability and achieves state-of-the-art accuracy on a number of standard benchmark datasets. In the work ""Scalable full flow with learned binary descriptors"" we have extended the approach towards motion estimation. In order to speed-up the process, we have additionally incorporated a binarization strategy which significantly speeds-up the feature matching.

* In the paper “Total Roto-Translational Variation”, we exploit"
In continuous optimization, we have proposed different algorithms that significantly go beyond the state-of-the-art in continuous optimization:

* In case of total variation minimization, our proposed hybrid algorithms based on dynamic programming and continuous optimization appear as the currently fastest algorithms for such problems. In particular, if only approximate solutions are necessary, the proposed algorithms are of high interest because they deliver good solutions already after a very small number of iterations.

* For non-smooth and non-convex optimization, we have proposed an inertial variant of the proximal alternating linearization method. We have proven convergence of the sequence of iterates in case of semi-algebraic functions. The inertial PALM algorithm appear to be significantly faster compared to the original PALM algorithm and it shows a certain ability to overcome spurious stationary points. The corresponding paper appears to be among the top-20 of the most downloaded papers of the SIAM Imaging journal.

* It has been known for some time that first-order primal-dual algorithms perform significantly better than their theoretical worst-case performance on problems which are only partially strongly convex in either the primal or dual variable. Examples include total generalized variation minimization, image reconstruction involving a linear operator, e.g. image deconvolution or MRI reconstruction. We have shown that we can partially accelerate first-order primal-dual algorithms and hence can get fast convergence at least at the block of variables corresponding to the strongly convex part of the problem.

On the variational modeling side, we have achieved the following most striking results:

* We have proposed a new convex relaxation of curvature minimizing variational models that represent the gradient of an image in the so-called roto-translation space, which is the 3D product space of the 2D image domain and the domain of the orientation of the image gradient. By means of the representation, we have found a general framework to represent the curvature of the level lines of an image as a convex energy in terms of a vector-valued measure. For numerical simulation, we found a discrete approximation of the continuous model, that can be solved by means of a first-order primal-dual algorithm.

* We have proposed variational networks, which can be regarded as a time-dynamic integro-differential eqqation, whose structure is inspired by variational models. In its most simple form, it can be seen as a gradient descent on a variational model, where the parameters of the model (filers, potential functions) are allowed to change in each iteration. Hence, variational models combine two important advantages: On the one hand, they offer a very strong model and on the other hand, they are extremely efficient. Moreover, variational networks show strong connections to state-of-the-art deep-learning architectures such as residual networks. We have applied variational networks to a number of imaging problems including image restoration, image superresolution, JPEG deblocking and MRI and CT reconstruction. In call cases we have been able to produce results significantly going beyond the state-of-the-art.
Shape completing using a curvature minimizing variational model.
MRI reconstruction using a variational network.
The boundary of a disk appears as a helix in the roto-translation space.
Strcuture of one step of the variational network (VN)
Image inpainting using a curvature minimizing variational model.