Project description
Teaching machines to understand what they see
Generating images with the aid of computers has come a long way. Today’s technology and algorithms can simulate the world around us. What is more, the computer vision technique can recognise and predict identities and actions from pictures or videos. However, computer vision cannot manage 3D shapes correctly, and its semantics are not matched with pixel-perfect appearances. As a result, the designing of 3D environments, such as in games or films, remains laborious. The EU-funded PIPE project will work to solve these problems with new models that combine computer vision and simulation with machine learning for pixel-perfect 3D vision and generative modelling. With the use of deep convolutional neural networks learning, it will allow the creation of realistic samples of meaningful synthetic images.
Objective
A fascinating tension exists between computer vision and computer graphics. Decades of research efforts have led to the ability of graphics algorithms to simulate the world to a degree often indistinguishable from reality -- given an accurate enough model of scene geometry and appearance. Similarly, decades of ingenuity have given computer vision techniques the already, at times, superhuman capability of detecting, recognizing, and predicting objects, actions, and identities from pictures or video.
Vision and graphics meet at a common point of pain: the model of scene geometry and appearance. To yield photorealistic results, graphics algorithms require an essentially perfect forward model. Yet, the capability of computer vision algorithms to robustly and accurately reason about the 3D shape and appearance of the world, unfortunately, greatly lags behind the capabilities to detect, recognize, segment, and so on. A great discrepancy exists between the semantic and the pixel-perfect, accurate shape and appearance. Bridging this chasm is the goal of this research.
This entails solving fundamental, long-standing, unsolved problems in computer vision through the aid of computer graphics and machine learning}. First, we seek to simultaneously capture accurate 3D shape and appearance of complex real-world scenes from photographic inputs; second, we seek to extend these capabilities still further to``zero-shot'' generative modelling. These extremely ambitious goals will be reached by marrying simulation (rendering) and machine learning, building on the PI's three existing strengths: (1) ability to capture photorealistic material appearance models using commodity devices; (2) his leading standing in physically-based image synthesis; and (3) his results on generative modeling of photorealistic images through deep convolutional neural networks.
Fields of science
- natural sciencescomputer and information sciencesartificial intelligencecomputer vision
- natural sciencesmathematicspure mathematicsgeometry
- natural sciencescomputer and information sciencesartificial intelligencemachine learning
- natural sciencescomputer and information sciencesartificial intelligencecomputational intelligence
Programme(s)
Funding Scheme
ERC-COG - Consolidator GrantHost institution
02150 Espoo
Finland