Our first work, Refining 3D Human Texture Estimation from a Single Image, in under submission currently. In this work, we work on 3D human texture estimation with deep neural networks. Estimating 3D human texture from a single image is essential in graphics and vision. It requires learning a mapping function from input images of humans with diverse poses into the parametric uv space and reasonably hallucinating invisible parts. To achieve a high-quality 3D human texture estimation, we propose a framework that adaptively samples the input by a deformable convolution where offsets are learned via a deep neural network. Both offsets and the deformable convolution are deeply supervised. Additionally, we describe a novel cycle consistency loss that improves view generalization. We further propose to train our framework with an uncertainty-based pixel-level image reconstruction loss, which enhances color fidelity. We compare our method against the state-of-the-art approaches and show significant qualitative and quantitative improvements.
Our second work, StyleRes: Transforming the Residuals for Real Image Editing with StyleGAN, is accepted to Computer Vision and Pattern Recognition (CVPR 2023). In this work, we work on exploring 3D editing capabilities of 2D GANs on real images. Style-based GAN models are shown to learn an implicit 3D knowledge of objects without a supervision. One can control the viewpoint of the synthesized object by its latent codes. They are used to generate multi-view images for training 3D reconstruction models. In this work, we improve the viewpoint editing of real images.
We present a novel image inversion framework and a training pipeline to achieve high-fidelity image inversion with high-quality attribute editing. Inverting real images into StyleGAN’s latent space is an extensively studied problem, yet the trade-off between the image reconstruction fidelity and image editing quality remains an open challenge. The low-rate latent spaces are limited in their expressiveness power for high-fidelity reconstruction. On the other hand, high-rate latent spaces result in degradation in editing quality. In this work, to achieve high-fidelity inversion, we learn residual features in higher latent codes that lower latent codes were not able to encode. This enables preserving image details in reconstruction. To achieve high quality editing, we learn how to transform the residual features for adapting to manipulations in latent codes. We train the framework to extract residual features and transform them via a novel architecture pipeline and cycle consistency losses.