Periodic Reporting for period 2 - 4DRepLy (Closing the 4D Real World Reconstruction Loop)
Reporting period: 2020-03-01 to 2021-08-31
The foundational problems investigated in 4DRepLy pave the way for new possibilities for the use of visual computing technology that bring together computer graphics and computer vision techniques in the real world in new ways. Society will benefit from these new possibilities in many ways. They pave the way for new means to culturally and creatively express ourselves as they enable new ways to more efficiently and at higher quality create computer graphics content. The new techniques also pave the way for new ways how we communicate with each other and how we naturally interact with intelligent computing systems and assistants of the future. They will empower greatly improved virtual and augmented reality scenarios, build foundations for new photo-real immersive telepresence systems, and enable new types of man-machine interaction approaches. Further, the insights gained in 4DRepLy will also lay the algorithmic foundations for advanced approaches of visual scene reconstruction and visual scene understanding, which is an essential precondition for future intelligent and autonomous systems that need to perceive and understand the human world in order to assist humans, and in order to safely act and interact with the human world. We also believe that the advanced capture approaches, notably human reconstruction methods, developed in 4DRepLy will benefit other domains of research, such as biomechanics, medicine, or cognitive science.
The project takes unconventional methodical paths by investigating fundamental principles of new ways to combine machine learning-based and explicit model-based or expert-designed representations and algorithms. Here, we made important advancements on several fronts which are important building blocks of the overall research program, for instance: 1) adapting classical explicit representations such that they can be automatically adapted, combined and end-to-end trained with deep learning-based approaches; 2) advancing neural network-based approaches such that they can be combined with explicit models and are geared to learn more semantically plausible representations of scenes, as well as algorithms using these representations for reconstruction and synthesis; 3) developing foundational concepts for different degrees of integration of learning-based and explicit approaches for reconstruction and synthesis, ranging from approaches that exercise weak integration of the two, up to approaches that enable full end-to-end integration and training of both explicit and learning-based components; 4) new strategies to train and refine such integrated methods on a corpus of unlabeled or weakly labeled real world observations which forms the basis for closing a 4D real world reconstruction loop.
All the aforementioned advancements methodically rethink established concepts in graphics, vision and machine learning, and benefit from our unique strategy to deeply combine advanced forward models from graphics with concepts from vision and machine learning in the real world in entirely new ways. Individual methodical aspects of these unconventional methodical advancements are developed and evaluated in the individual sub-projects published in the reporting period. The following are examples.
They enabled entirely new approaches for highly efficient dynamic reconstruction of faces, alongside learning of full parametric scene models, from weakly or unlabeled in the wild data (CVPR 2018, CVPR 2019, two papers at CVPR 2021). They also enabled entirely new means to capture and reconstruct human motion, dense deforming human surface geometry, as well as appearance, from a single camera at previously unseen efficiency, and under highly challenging scene conditions. Important results were the first approaches in the literature to do dense, space-time coherent performance capture of humans in loose clothing from a single color camera (ACM TOG 2019, CVPR 2020), new methods to do real-time multi-person motion capture from a single camera under difficult occlusions (SIGGRAPH 2020), or the first approaches in the literature to capture the shape and motion of two hands in close interaction from a single depth (SIGGRAPH 2019) or single color (SIGGRAPH 2020) camera.
The insights gained in 4DRepLy also enabled us to make pioneering contributions to a new field in visual computing, which is termed neural rendering. Neural rendering enables highly realistic synthesis of images and videos in a data-driven way, without having to resort to the established complex and time-consuming scene modeling and light-transport based rendering approaches from computer graphics. Our new ways to combine explicit model-based and learning-based approaches for image synthesis are instrumental here and enabled, for example, the videorealstic animation of human faces (SIGGRAPH 2019, SIGGRAPH Asia 2019, SIGGRAPH Asia 2020) and full bodies (CVPR 2020, ACM TOG 2019, IEEE TVCG 2020) at new levels of quality and efficiency. Our newly developed ways to learn a scene model implicitly from images within a neural network is also exemplary for this impactful line of work (NeurIPS 2020) that offers a new methodical perspective on high quality computer graphics modeling and rendering.
The many methodical insights gained in the project so far were disseminated in more than 60 publications (in the best conferences and journals in computer graphics, computer vision and machine learning) and technical reports. These publications include 16 papers in ACM Transactions on Graphics (including 5 papers at SIGGRAPH and 7 papers at SIGGRAPH Asia), as well as 26 publications in the top computer vision conferences (CVPR, ICCV, ECCV).