European Commission logo
English English
CORDIS - EU research results

Learning Pixel-Perfect 3D Vision and Generative Modeling

Project description

Teaching machines to understand what they see

Generating images with the aid of computers has come a long way. Today’s technology and algorithms can simulate the world around us. What is more, the computer vision technique can recognise and predict identities and actions from pictures or videos. However, computer vision cannot manage 3D shapes correctly, and its semantics are not matched with pixel-perfect appearances. As a result, the designing of 3D environments, such as in games or films, remains laborious. The EU-funded PIPE project will work to solve these problems with new models that combine computer vision and simulation with machine learning for pixel-perfect 3D vision and generative modelling. With the use of deep convolutional neural networks learning, it will allow the creation of realistic samples of meaningful synthetic images.


A fascinating tension exists between computer vision and computer graphics. Decades of research efforts have led to the ability of graphics algorithms to simulate the world to a degree often indistinguishable from reality -- given an accurate enough model of scene geometry and appearance. Similarly, decades of ingenuity have given computer vision techniques the already, at times, superhuman capability of detecting, recognizing, and predicting objects, actions, and identities from pictures or video.

Vision and graphics meet at a common point of pain: the model of scene geometry and appearance. To yield photorealistic results, graphics algorithms require an essentially perfect forward model. Yet, the capability of computer vision algorithms to robustly and accurately reason about the 3D shape and appearance of the world, unfortunately, greatly lags behind the capabilities to detect, recognize, segment, and so on. A great discrepancy exists between the semantic and the pixel-perfect, accurate shape and appearance. Bridging this chasm is the goal of this research.

This entails solving fundamental, long-standing, unsolved problems in computer vision through the aid of computer graphics and machine learning}. First, we seek to simultaneously capture accurate 3D shape and appearance of complex real-world scenes from photographic inputs; second, we seek to extend these capabilities still further to``zero-shot'' generative modelling. These extremely ambitious goals will be reached by marrying simulation (rendering) and machine learning, building on the PI's three existing strengths: (1) ability to capture photorealistic material appearance models using commodity devices; (2) his leading standing in physically-based image synthesis; and (3) his results on generative modeling of photorealistic images through deep convolutional neural networks.

Host institution

Net EU contribution
€ 1 858 013,00
02150 Espoo

See on map

Manner-Suomi Helsinki-Uusimaa Helsinki-Uusimaa
Activity type
Higher or Secondary Education Establishments
Total cost
€ 1 858 013,00

Beneficiaries (1)