Skip to main content

Modeling, interpreting and manipulating digital video

Final Report Summary - VIDEOWORLD (Modeling, interpreting and manipulating digital video)

Today, an enormous amount of resources is dedicated to the creation of digital video content, its storage, and its distribution. Effective general-purpose technology for doing something with this content in an automated or semi-automated fashion, on the other hand, is cruelly missing. The main objectives of the VideoWorld project are to remedy this situations by developing new models of image and video content, together with algorithms using these models to interpret a video clip (what is happening in a scene, who is in it, where is it shot, when does some action occur) and manipulate its content (deblurring, denoising, zooming, or editing).

In this context, the main achievements of the VideoWorld project can be summarized as follows:

New models of image and video content. We have developed novel spatio-temporal models of video content, including sparse (locally) linear representations of object appearance, geometric models of scene layout in time-lapse videos, and a new and unified framework for multi-view geometry for both central and general non-central cameras associated with algebraic line congruences.

New models of the video interpretation process. We have developed novel algorithms for matching deformable part-based models and graphical representations of images, and learning these models from annotated data, with applications ranging from art authentication to category-level object recognition and scene classification. We have also introduced effective weakly-supervised algorithms for visual scene interpretation, image and video shot cosegmentation, and frame-to-text video alignment tasks, and developed totally unsupervised approaches to object discovery in image and video collections.

New models of the video manipulation process. We have used our learned sparse appearance models in tasks ranging from denoising to inverse halftoning, and developed algorithms that jointly learn to estimate and remove image blur due to camera or object motion. We have also developed novel methods for camera shake removal using analytical models of the image formation process and effective algorithms for image upsampling, texture removal and edge detection using robust non-convex regularizers.

The proposed models and algorithms have been implemented and extensively tested on standard benchmark datasets, and consistently matched or outperformed the state of the art.