Final Report Summary - LEAP (LEarning from our collective visual memory to Analyze its trends and Predict future events)
In this context, the main achievements of the LEAP project can be summarized as follows.
We have introduced novel visual representations that generalize across vastly different imaging conditions often present in the distributed data. Examples include day/night illumination, major changes in viewpoint such as from hand-held and car-mounted cameras, or variation over time such as seasonal changes or buildings built or destroyed. These models have been used to improve the performance of visual localization - an important application for autonomous driving - as well as opened-up new applications such as geo-registering historical and non-photographic imagery. We have also developed a computational model of analogy reasoning to recognize unseen relations between objects - a step towards generalization to yet unseen situations.
Next, we have developed several novel methods to learn the visual vocabulary of patterns from data using readily-available but weak (incomplete and noisy) meta-data. This has lead to results exceeding state-of-the-art in object, place and human activity recognition in some cases outperforming fully supervised methods. This is an important step towards machines that learn automatically from readily available but noisy meta-data without the need for explicit manual supervision, which is often expensive and hard to obtain.
Towards the goal of finding and quantifying trends, we have developed several new computational models to discover and analyze spatial and temporal trends in image collections. Potential applications of these models include, for example, assessing disease progression in radiology or discovering new insights about long-term evolution of visual style in art, history, architecture or design.
Finally, towards the goal of predictive modelling, we have introduced several novel temporal models of dynamic scenes. The developed models are able to temporally localize steps of complex human activities, estimate 3D motion and contact forces of people interacting with objects, model relations between similar human activities, or predict future actions in video seconds before they occur. We have shown benefits of the developed models on existing benchmarks as well as newly collected large-scale datasets. Potential applications include autonomous robotics, self-driving cars or automatic assistance.