Rendering 3D images with attributes learned from 2D images via Deep Learning

Periodic Reporting for period 1 - 3DIS-NN (Rendering 3D images with attributes learned from 2D images via Deep Learning)

Période du rapport: 2021-03-01 au 2023-02-28

3D (3-dimensional) Image Synthesis (3DIS) is a technology to render objects from different views which enables numerous applications in computer graphics and computer vision. As the digital world is becoming more crucial especially in the times of pandemic, 3DIS can provide tools for online classes, virtual social tours, improved gaming experience and simulators for robotics by providing realistic virtual 3D environments.

Here, I propose 3DIS-NN, a set of methods to improve the quality of 3DIS with deep neural networks (DNNs), and bring it close to the production quality, which will contribute to the European Union’s Future and Emerging Technology ambitions of Horizon Europe.

Our first work, Refining 3D Human Texture Estimation from a Single Image, in under submission currently. In this work, we work on 3D human texture estimation with deep neural networks. Estimating 3D human texture from a single image is essential in graphics and vision. It requires learning a mapping function from input images of humans with diverse poses into the parametric uv space and reasonably hallucinating invisible parts. To achieve a high-quality 3D human texture estimation, we propose a framework that adaptively samples the input by a deformable convolution where offsets are learned via a deep neural network. Both offsets and the deformable convolution are deeply supervised. Additionally, we describe a novel cycle consistency loss that improves view generalization. We further propose to train our framework with an uncertainty-based pixel-level image reconstruction loss, which enhances color fidelity. We compare our method against the state-of-the-art approaches and show significant qualitative and quantitative improvements.

Our second work, StyleRes: Transforming the Residuals for Real Image Editing with StyleGAN, is accepted to Computer Vision and Pattern Recognition (CVPR 2023). In this work, we work on exploring 3D editing capabilities of 2D GANs on real images. Style-based GAN models are shown to learn an implicit 3D knowledge of objects without a supervision. One can control the viewpoint of the synthesized object by its latent codes. They are used to generate multi-view images for training 3D reconstruction models. In this work, we improve the viewpoint editing of real images.
We present a novel image inversion framework and a training pipeline to achieve high-fidelity image inversion with high-quality attribute editing. Inverting real images into StyleGAN’s latent space is an extensively studied problem, yet the trade-off between the image reconstruction fidelity and image editing quality remains an open challenge. The low-rate latent spaces are limited in their expressiveness power for high-fidelity reconstruction. On the other hand, high-rate latent spaces result in degradation in editing quality. In this work, to achieve high-fidelity inversion, we learn residual features in higher latent codes that lower latent codes were not able to encode. This enables preserving image details in reconstruction. To achieve high quality editing, we learn how to transform the residual features for adapting to manipulations in latent codes. We train the framework to extract residual features and transform them via a novel architecture pipeline and cycle consistency losses.

The 3DIS-NN opens various applications in a virtual environment. The world now recognizes the importance of virtual tools especially during this time of pandemic. As the classes and meetings became online, and shopping malls, museums, and historical places being closed, many of these learning, and socializing activities moved into a digital environment. With the world becoming more digital, 3DIS-NN has a unique position to enable various applications such as online shopping, online meetings, and online gaming. For example, for online shopping, one can build an application through 3DIS-NN technology to try on clothes in a virtual environment by extracting 3D geometry of the human body and clothes. Similarly, designing houses with 3D virtual furniture can help designers as well as ordinary buyers to visualize the design before buying the furniture. These applications address industrial and societal needs.

From an input image, textured 3D models are predicted which can be rendered from novel views.

Periodic Reporting for period 1 - 3DIS-NN (Rendering 3D images with attributes learned from 2D images via Deep Learning)

Partager cette page

Télécharger