Skip to main content

Internet Vision: Distributed and secure algorithms for image search and match

Final Report Summary - INTERNETVISION (Internet Vision: Distributed and secure algorithms for image search and match)

Images today are captured by groups of people armed with smartphones. This creates new problems and offers new challenges to Computer Vision researchers. First, there is a need to develop tools to handle images captured by an array of cameras and not a single camera, as was traditionally done in the past. Second, the cameras can either be calibrated and synchronized or, more often today, not calibrated and not synchronized. Third, the fact that multiple cameras capture the same scene creates new opportunity to reduce transmission bandwidth and hence save battery life. There is also a need to develop better tools to enhance and manipulate the images, once they are captured and stored in the cloud. Finally, it is desirable to do all that in a privacy preserving manner that will protect and respect the privacy of people.

The main thrust of my work was to deal with arrays of cameras in a variety of settings, as detailed below.
The first project I did was to develop a method to recover structure and motion (SFM) from a calibrated camera array. The problem of SFM has been investigated extensively in the literature and is an important building block in many Computer Vision applications such as 3D reconstruction of urban scenes or capturing 3D human shapes. The method I proposed improved the state-of-the-art results in this important topic.

The camera array used by the first project consisted of a carefully calibrated and synchronized array of 25 cameras. This is not how image are captured in the wild. Today, images are often captured by a group of people armed with smartphones. The smartphones are not calibrated or synchronized beforehand and, as a result, we are left with a group of photos of some dynamic event taken at roughly the same place and time. The first order of business, and the one we tackled, is the problem of recovering the correct temporal order in which the images were taken. Naively, one might think that the clocks of the smartphones are enough to organize the images in the correct temporal order. Unfortunately, our experiments prove that the clocks are skewed and not well synchronized. Therefore, it is incumbent upon us to develop vision based techniques that will recover the correct temporal order from visual data directly. We term this new problem, Photo Sequencing, and offer a number of novel algorithms for solving it.

Working with arrays of cameras made me realize that the amount of data that must be transferred from the cameras to the host computer is huge. This puts a heavy burden on the host computer but is also wasteful on the camera side. After all, if multiple cameras capture the same scene, then much of the visual content is redundant and there is no need for each camera to transmit the full image back to the host. To this end I combined tools from Computational Photography and Distributed Source Coding to develop an efficient method to reduce bandwidth in such scenarios. Reducing bandwidth saves energy and this extends battery life which is one of the most important issues facing smartphone users.

Once the images arrive at the server they must be processed. I have proposed two algorithms in this domain. The first focused on the massive template matching problem which is an important building block in a variety of computer vision application. I proposed a novel algorithm that advanced the state of the art in this important field. The second algorithm I proposed dealt with the need to retarget a stereo image pair, where the goal is to perform a non-linear resizing of a pair of images while maintaining a plausible 3D interpretation of the scene.

Finally, I have been working on a secure template matching algorithm. The algorithm assumes that one party, Alice, holds a template, and another party, Bob, holds an image. The goal of the secure template matching algorithm is to detect if the template appears in the image or not, without revealing additional information to either party. An important issue we had to tackle is that a template never appears exactly as is in the image and our secure template matching was designed to handle small deformation due to illumination changes, geometric variations and sensor noise. I have used a combination of tools from Computer Vision and Cryptography to propose a novel solution to the problem.

This grant helped me launch a new research lab at the School of Electrical Engineering at Tel-Aviv University. I purchased the necessary hardware and attracted talented students to conduct the research with me. This research also helped me establish fruitful collaborations with colleagues here in Israel. As a result, I published 5 conference papers at top vision conferences and published two journal papers with a third one in submission. Our work has been well received and cited by the scientific community and helped establish my status as a leading researcher in the field.