Periodic Reporting for period 4 - 4DRepLy (Closing the 4D Real World Reconstruction Loop)
Reporting period: 2023-03-01 to 2023-08-31
The foundational problems investigated in 4DRepLy pave the way for new possibilities for the use of visual computing technology that bring together computer graphics, computer vision and machine learning techniques in the real world in new ways. Society will benefit from these new possibilities in many ways. They pave the way for new means to culturally and creatively express ourselves as they enable new ways to more efficiently and at higher quality create computer graphics content. The new techniques also pave the way for new ways how we communicate with each other and how we naturally interact with intelligent computing systems and assistants of the future. They will empower greatly improved virtual and augmented reality scenarios, build foundations for new photo-real immersive telepresence systems, and enable new types of man-machine interaction approaches. Further, the insights gained in 4DRepLy will also lay the algorithmic foundations for advanced approaches of visual scene reconstruction and visual scene understanding, which is an essential precondition for future intelligent and autonomous systems that need to perceive and understand the human world in order to assist humans, and in order to safely act and interact with the human world. We also believe that the advanced capture approaches, notably human reconstruction methods, developed in 4DRepLy will benefit other domains of research, such as biomechanics, medicine, or cognitive science.
The project took unconventional methodical paths by investigating fundamentally new ways to combine machine learning-based and explicit model-based or expert-designed representations and algorithms. Here, we made important advancements on several fronts which are important building blocks of the overall research program, for instance: 1) adapting classical explicit representations such that they can be automatically adapted, combined and end-to-end trained with deep learning-based approaches; 2) advancing neural network-based approaches such that they can be combined with explicit models and are geared to learn more semantically plausible representations of scenes, as well as algorithms using these representations for reconstruction and synthesis; 3) developing foundational concepts for different degrees of integration of learning-based and explicit approaches for reconstruction and synthesis, ranging from approaches that exercise weak integration of the two, up to approaches that enable full end-to-end integration and training of both explicit and learning-based components; 4) new strategies to train and refine such integrated methods on a continuous inflow of unlabeled or weakly labeled real world observations.
Our fundamental rethinking of concepts in graphics, vision and machine learning benefitted from our unique strategy to deeply combine advanced forward models from graphics with concepts from vision and machine learning in the real world in entirely new end-to-end ways. Individual methodical aspects or our overarching goal were investigated in individual sub-projects published within 4DRepLy. The following are examples.
We presented groundbreaking new methods for dynamic face reconstruction, alongside learning of full parametric scene models, from weakly or unlabeled in the wild data (CVPR 2019, two papers at CVPR’21). We also presented entirely new means to capture and reconstruct human motion, dense deforming human surface geometry and appearance, from a single camera at state-of-the-art fidelity (ACM TOG’19, CVPR’20 Best Stud. Paper Hon. Mention). Further innovations were first-of-their-kind methods to do monocular real-time multi-person (SIGGRAPH’20, EG’23) motion capture and capture of humans in scene contexts (SIGGRAPH’20, ECCV’22), or to capture the shape and motion of two hands in close interaction from a single color (SIGGRAPH Asia’20) camera. The project further introduced pioneering new methods to reconstruct and photo-realistically render general static (NeurIPS’20, ECCV’22) and dynamic scenes (CVPR’21), as well as humans and human faces (SIGGRAPH’19, SIGGRAPH Asia’19, SIGGRAPH’20), even under new motion, directly from single- or multi-view video (SIGGRAPH’21, SIGGRAPH Asia’21, SCA’2023, NeurIPS’23); the basis are new ways to integrate explicit and neural implicit models. Further, the project presented pioneering generative models for 2D and 3D data with greatly enhanced disentanglement (CVPR’22), or real-time geometric controllability (SIGGRAPH’23).
As the examples above show, the insights gained in 4DRepLy enabled us to make pioneering contributions to a new field in visual computing, which is termed neural rendering. Neural rendering enables highly realistic synthesis of images and videos in a data-driven way, without having to resort to the established complex and time-consuming scene modeling and light-transport based rendering approaches. Our new ways to combine explicit model-based and learning-based approaches for image synthesis were instrumental here.
The many methodical insights gained in the project so far were disseminated in more than 100 publications (in the best conferences and journals in computer graphics, computer vision and machine learning) and technical reports. Further, many results from the project, e.g. on neural rendering or interactive control of image generative models, were reported on widely in general media outlets worldwide. The project also contributed widely used research code bases and datasets.