Skip to main content

Innovative Volumetric Capture and Editing Tools for Ubiquitous Storytelling

Periodic Reporting for period 1 - INVICTUS (Innovative Volumetric Capture and Editing Tools for Ubiquitous Storytelling)

Reporting period: 2020-10-01 to 2021-09-30

INVICTUS aims at delivering innovative authoring tools for the creation of a new generation of high-fidelity avatars and the integration of these avatars in interactive and non-interactive narratives (movies, games, XR immersive productions). The consortium proposes to develop and exploit the full potential of volumetric motion capture technologies that consist in capturing simultaneously the appearance and motion of actors using RGB cameras to create volumetric avatars, and rely on these technologies to design narratives using novel collaborative VR authoring tools.

The project focuses on three research axes:
- improving pipelines to perform motion and appearance captures of characters with significant increase in fidelity
- proposing innovative editing tools on volumetric appearances and motions, such as transferring shapes, performing stylization, or adapting and transferring motions.
- proposing innovative authoring tools that will build on VR interactive technologies to plunge storytellers in virtual representations of their narrative to edit decors, layouts and animated characters.
By demonstrating and communicating on how these technologies can be immediately exploited in both traditional media and novel media narratives, the INVICTUS project will open opportunities in the EU market for more compelling, immersive and personalized visual experiences, at the crossroads of film and game entertainment, reducing the cost in content creation, improving fidelity of characters and boosting creativity.
During the first reporting period, different scenes of volumetric data were acquired and are currently being processed, a teacher-student interaction and in another recording, the explanation of artwork by a subject expert. Many parts of the volumetric capture pipeline have seen engineering improvements, including depth initialization, BRDF estimation and semantic material segmentation, geometry refinements, visual hull constraints for mesh tracking and the reduction of the number of keyframes. Different machine-learning approaches were investigated and successfully applied for some tasks, such as the foreground-background segmentation. The differences and commonalities of the workflows and data formats were compiled for the consortium.
Two approaches for hybrid model generation were introduced: a deep generative model that inflates the skeletal representation to full mesh, and a deep generative model that creates textures and mesh offsets from given skeletal poses. Inverse kinematics and tangent space optimizations were explored for skeletal representations. Further, a model-based gaze correction was developed for interaction in VR.
The generation of model, motion and skeletal databases is starting as the recorded volumetric sessions are yielding processed volumetric videos. A method to join volumetric sequences was developed and methods for synthesizing speech gestures animation were investigated.
Regarding the creative tools for high-fidelity contextualized avatars, the focus was on developing a powerful and efficient editing tool pipeline. An avatar of a real person is currently created in less than 30 minutes with his face, his hair, and eyes. Recently, the full body creation has been added to the pipeline.
A machine-learning solution is being studied for the HMD Removal. A Convolutional Neuronal Network is used for training and coupled with a Generative Adversarial Network architecture to obtain the list of blendshapes corresponding to the spoken text. An animation is created from these blendshapes and applied to an avatar (created thanks to the editing tool).
Another deep-learning architecture is used for facial style transfer. The idea is to have a stylisation of a person keeping his personality to be relevantly integrated into a game or a movie (or any other scene). A derivate work consists of using this technology to have a caricature. This later work is being studied.
Moreover, partners Ubisoft and Université de Rennes 1 have worked on the delivery of a Virtual Reality storytelling tool which encompasses editing features and animation features. The result is an open-source, MIT-licenced technology made available to the consortium called VRtist: github.com/ubisoft/vrtist. The tool is also accessible for content creation and research purposes. Ubisoft and Université de Rennes 1 also worked on the specific issues of animation in VR, by proposing a helix-based representation for manipulations and integration of volumetric videos.
Additionally, the consortium identified three different use-cases which will be implemented and on which evaluations will be performed, with regards to the identified impacts. Use-cases describe scenarios applicable to animation industries, as well as cultural and educational entities. In addition, partners collaborated to design the evaluation methodology and metrics to assess the Key Performance Indicators.
Regarding the capture and creation of animatable volumetric video, algorithmic improvements have been included to an existing volumetric video production workflow. Approaches for geometry refinement, mesh decimation, foreground background segmentation and several deep learning-based methods have been investigated and developed as part of INVICTUS.
A new approach for a learned hybrid model of the human body was developed with INVICTUS. Network optimizations were included to increase the processing speed, while increasing the output resolution of the texture. In addition, the template mesh was replaced by SMPL which required adaptations and new algorithms for rigging and mesh registration.
Concerning model animation, tools for joining temporal sequences were developed and added to an existing basic animation scheme for volumetric assets. Also, all deep neural network approaches for synthesizing speech gesture animation and to learn skeleton/joint motion sequences were developed through INVICTUS.
The existing text-driven facial animation was significantly enhanced. Two new methods for mapping text to the latent synthesis space, based on an autoregressive network and a transformer, were developed. Additionally, synchronization of lip motion to speech signals and the integration of a text-to-speech-engine were developed. New networks have been trained for the new captured sequences and use cases for Sarah and Johnny.
Throughout the project, we have significantly enhanced the existing VR authoring tool VRtist by improving existing features such as the animation engine and edition, the controllers, the movements, and virtual objects interactions. New features were also implemented such as the lobby, the motion trail representations, and the skeletal encoding of characters. VRtist was released as a standalone and open-sourced.

INVICTUS aims at positively impacting creativity and productivity in media production, and immersion and quality of XR media experiences by
-Reducing by 50% the cost of producing and using realistic 3D representations and motions of avatars
-Increasing productivity and creativity through more automated tools that enable quicker iterations from the ideas to its realization
-And creating more compelling user experiences through high-fidelity avatars in appearance and motion, and better integration of these avatars in narratives.
Invictus Project Logo