Skip to main content
European Commission logo print header

Inpainting Tools for Video Post-production: Variational theory and fast algorithms

Final Report Summary - INPAINTING (Inpainting Tools for Video Post-production: Variational theory and fast algorithms)

For removing an object from a scene in a video, one might need to inpaint the disoccluded background in a frame and then propagate it thoughout the video, for which one needs to estimate the motion first. Also, it might be necessary to know which objects are in front and behind the removed object, since objects in front do not need to be modified. Nowadays, VFX artists in 2D and 3D post-production houses perform these tasks aided by software tools. Still, VFX remains a manual, time consuming and costly process. The automatation of the before-mentioned problems: (relative) depth estimation, motion estimation, image inpainting and propagation in video are fundamental open problems of video processing and computer vision. In this project we address these problems from a theoretical and practical perspective. We propose and analize new mathematical models and frameworks, and develop practical algorithms with the insight provided by the theory. The main contributions are:
1. Computing similarities between image regions plays an essential role in many computer vision problems. The size and shape of the regions are critical parameters for the performance of the final application. We first propose a theoretical framework where we do a multiscale (for different sizes) analysis of similarities. Then, we introduce a novel “affine covariant metric” that specifies the shape of the neighborhood taking into account the image structure.
2. We applied these ideas to the image inpainting problem, where the goal is to fill-in a region in the image where the information is unknown (e.g. the case of object removal). State-of-the-art algorithms copy pieces of the known part of the image and assemble them coherently to cover the unkown region. These algorithms cannot deal with cases in which the useful information is transformed (rotated, zoomed, etc.). Using the affine covariant metrics we propose an inpainting algorithm that correctly handles affine/perspective transformations.
3. To modify the texture of an object in a video (add a logo, hide a marker, etc.), the editing is first performed in one frame and then automatically propagated throughout the video in a seamless manner. Most current approaches handle only simple motions, for instance assuming that the object is rigid and planar. We introduced a variational model that allows to propagate a texture without any restrictive assumptions on the motion, allowing also to handle occlusions of the edited texture and some illumination changes. We also propose fast numerical methods to solve the proposed model.
4. The optical flow is the apparent motion between two consecutive frames of a video sequence. Its computation is a fundamental, long dating and challenging problem. During this project we have done two contributions in this context: a) We have improved the numerical algorithm (decreasing the computation time) of a previously developed optical flow method which jointly estimates the optical flow and the occlusion map. b) We have proposed a new optical flow algorithm which properly handles rotational movements, produces good smoothness conditions on the flow field and preserves discontinuities. Real scenes usually contain rotation movements, sometimes at an infinitesimal level, and therefore the use of the proposed algorithm helps to obtain more accurate and realistic optical flows.
5. Depth maps are an important source of information. Not only are they useful in the generation of 3D content, but also greatly simplify the post-production process of 2D content. There are several ways to estimate a depth image (e.g. time of flight), but in most cases, the resulting depth image is of much lower resolution than the color image. We have proposed a method to increase the resolution of the depth map using the color image as a guide. Another important aspect of our work is the relative depth estimation from a single image by using perceptual laws based low-level cues such as convexity and closure.