## Periodic Reporting for period 4 - HOMOVIS (High-level Prior Models for Computer Vision)

Reporting period: 2019-12-01 to 2020-12-31

"In recent years, computer vision as a technology has found its way into smart phones, autonomous cars, medical image processing, industrial inspection and surveillance tasks. Due to the every increasing amount of available visual data, it is expected that computer vision will play an even more important role for the society. The space of ""natural"" images is extremely large. For example considering the space of images of only 64x64 pixels and 256 different gray values, one can generate already 10 to the power of 10000 different images, a number which is by far larger than the (estimated) number of atoms in observable the universe. In this space, natural (or plausible) images are lying only on a certain thin manifold with a much smaller dimension. However, the structure of the manifold turns out to be extremely complicated, because it has to reflect many different transformations of the images such as translations, rotations, brightness changes etc.

A major aim of computer vision and image processing is to develop mathematical models to describe these manifolds, for example for image reconstruction, image classification, stereo etc. While most existing mathematical models are limited to local pixel interactions and hence only have a sense of edges, the aim of the research project is to develop high-level prior models that should eventually be able to describe the manifolds of objects ""faces"" of ""dogs"" or ""cars"". Natural images have (among others) two important invariances: They are invariant with respect to translations (objects can occur at different positions in images) and they are invariant with respect to rotations (objects might occur at different rotations). These two invariances are one of the main concepts of the variational models proposed in this project."

A major aim of computer vision and image processing is to develop mathematical models to describe these manifolds, for example for image reconstruction, image classification, stereo etc. While most existing mathematical models are limited to local pixel interactions and hence only have a sense of edges, the aim of the research project is to develop high-level prior models that should eventually be able to describe the manifolds of objects ""faces"" of ""dogs"" or ""cars"". Natural images have (among others) two important invariances: They are invariant with respect to translations (objects can occur at different positions in images) and they are invariant with respect to rotations (objects might occur at different rotations). These two invariances are one of the main concepts of the variational models proposed in this project."

"* Together, with A. Chambolle, we have been invited to write a review paper in the Acta Numerica Journal, which was a great honor for us. The paper contains a broad overview of continuous optimization approaches for image processing and computer vision. The paper was published in the beginning of 2016.

* Motivated by the large-scale optimization problems emerging from this project we have proposed several novel optimization algorithms, e.g. combining dynamic programming with continuous optimization or novel accelerated non-linear proximal algorithms for non-smooth and non-convex optimization

* We have proposed in the paper ""Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration"" to learn the parameters of a highly parametrized reaction-diffusion equation for solving inverse problems in imaging. In subsequent work ""Variational networks: Connecting variational methods and deep learning"", we have generalized the notion of the trainable diffusion equations towards variational networks. In a more application-driven follow up paper entitled ""Learning a variational network for reconstruction of accelerated MRI data"", we have used the reaction-diffusion equations for learning to reconstruct MRI images from undersampled MRI data.

* In the paper ""End-to-end training of hybrid CNN-CRF models for stereo"" we have proposed an end-to-end learning framework for stereo computation. The main idea is to learn convolutional neural network (CNN) features for image matching which in turn serve as unary weights in a conditional random field (CRF) model. In more recent follow up work we proposed a more efficient and more flexible variant based on belief propagation.

* In the paper “Total Roto-Translational Variation”, we exploit the roto-translation space to find a convex representation of curvature minimizing variational models. The roto-translation space is the 3D product space of the 2D image domain and the domain of the orientation space of the image gradient. One of the striking advantage of the roto-translation space is that it allows to represent curvature minimizing variational models as convex energies."

* Motivated by the large-scale optimization problems emerging from this project we have proposed several novel optimization algorithms, e.g. combining dynamic programming with continuous optimization or novel accelerated non-linear proximal algorithms for non-smooth and non-convex optimization

* We have proposed in the paper ""Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration"" to learn the parameters of a highly parametrized reaction-diffusion equation for solving inverse problems in imaging. In subsequent work ""Variational networks: Connecting variational methods and deep learning"", we have generalized the notion of the trainable diffusion equations towards variational networks. In a more application-driven follow up paper entitled ""Learning a variational network for reconstruction of accelerated MRI data"", we have used the reaction-diffusion equations for learning to reconstruct MRI images from undersampled MRI data.

* In the paper ""End-to-end training of hybrid CNN-CRF models for stereo"" we have proposed an end-to-end learning framework for stereo computation. The main idea is to learn convolutional neural network (CNN) features for image matching which in turn serve as unary weights in a conditional random field (CRF) model. In more recent follow up work we proposed a more efficient and more flexible variant based on belief propagation.

* In the paper “Total Roto-Translational Variation”, we exploit the roto-translation space to find a convex representation of curvature minimizing variational models. The roto-translation space is the 3D product space of the 2D image domain and the domain of the orientation space of the image gradient. One of the striking advantage of the roto-translation space is that it allows to represent curvature minimizing variational models as convex energies."

In continuous optimization, we have proposed different algorithms that significantly go beyond the state-of-the-art in continuous optimization:

* In case of total variation minimization, our proposed hybrid algorithms based on dynamic programming and continuous optimization appear as the currently fastest algorithms for such problems. In particular, if only approximate solutions are necessary, the proposed algorithms are of high interest because they deliver good solutions already after a very small number of iterations.

* For non-smooth and non-convex optimization, we have proposed an inertial variant of the proximal alternating linearization method. We have proven convergence of the sequence of iterates in case of semi-algebraic functions. The inertial PALM algorithm appear to be significantly faster compared to the original PALM algorithm and it shows a certain ability to overcome spurious stationary points. The corresponding paper appears to be among the top-20 of the most downloaded papers of the SIAM Imaging journal.

* It has been known for some time that first-order primal-dual algorithms perform significantly better than their theoretical worst-case performance on problems which are only partially strongly convex in either the primal or dual variable. Examples include total generalized variation minimization, image reconstruction involving a linear operator, e.g. image deconvolution or MRI reconstruction. We have shown that we can partially accelerate first-order primal-dual algorithms and hence can get fast convergence at least at the block of variables corresponding to the strongly convex part of the problem.

On the variational modeling side, we have achieved the following most striking results:

* We have proposed a new convex relaxation of curvature minimizing variational models that represent the gradient of an image in the so-called roto-translation space, which is the 3D product space of the 2D image domain and the domain of the orientation of the image gradient. By means of the representation, we have found a general framework to represent the curvature of the level lines of an image as a convex energy in terms of a vector-valued measure. For numerical simulation, we found a discrete approximation of the continuous model, that can be solved by means of a first-order primal-dual algorithm.

* We have proposed variational networks, which can be seen as a gradient descent on a variational model, where the parameters of the model (filers, potential functions) are allowed to change in each iteration. Hence, variational models combine two important advantages: On the one hand, they offer a very strong model and on the other hand, they are extremely efficient. Moreover, variational networks show strong connections to state-of-the-art deep-learning architectures such as residual networks.

* We have successfully combined efficient algorithms for solving image labeling problems with deep-learning resulting in end-to-end learnable algorithms which yield state-of-the art results.

* Recently, we have proposed algorithms for learning highly accurate discretizations of the total variation and we have shown consistency of the learned discretization in the framework of Gamma-convergence.

* In case of total variation minimization, our proposed hybrid algorithms based on dynamic programming and continuous optimization appear as the currently fastest algorithms for such problems. In particular, if only approximate solutions are necessary, the proposed algorithms are of high interest because they deliver good solutions already after a very small number of iterations.

* For non-smooth and non-convex optimization, we have proposed an inertial variant of the proximal alternating linearization method. We have proven convergence of the sequence of iterates in case of semi-algebraic functions. The inertial PALM algorithm appear to be significantly faster compared to the original PALM algorithm and it shows a certain ability to overcome spurious stationary points. The corresponding paper appears to be among the top-20 of the most downloaded papers of the SIAM Imaging journal.

* It has been known for some time that first-order primal-dual algorithms perform significantly better than their theoretical worst-case performance on problems which are only partially strongly convex in either the primal or dual variable. Examples include total generalized variation minimization, image reconstruction involving a linear operator, e.g. image deconvolution or MRI reconstruction. We have shown that we can partially accelerate first-order primal-dual algorithms and hence can get fast convergence at least at the block of variables corresponding to the strongly convex part of the problem.

On the variational modeling side, we have achieved the following most striking results:

* We have proposed a new convex relaxation of curvature minimizing variational models that represent the gradient of an image in the so-called roto-translation space, which is the 3D product space of the 2D image domain and the domain of the orientation of the image gradient. By means of the representation, we have found a general framework to represent the curvature of the level lines of an image as a convex energy in terms of a vector-valued measure. For numerical simulation, we found a discrete approximation of the continuous model, that can be solved by means of a first-order primal-dual algorithm.

* We have proposed variational networks, which can be seen as a gradient descent on a variational model, where the parameters of the model (filers, potential functions) are allowed to change in each iteration. Hence, variational models combine two important advantages: On the one hand, they offer a very strong model and on the other hand, they are extremely efficient. Moreover, variational networks show strong connections to state-of-the-art deep-learning architectures such as residual networks.

* We have successfully combined efficient algorithms for solving image labeling problems with deep-learning resulting in end-to-end learnable algorithms which yield state-of-the art results.

* Recently, we have proposed algorithms for learning highly accurate discretizations of the total variation and we have shown consistency of the learned discretization in the framework of Gamma-convergence.