Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Correspondence through Millions Bodies: a large-scale, functional, and implicit data-driven method for 3D Humans matching

Periodic Reporting for period 1 - CoMBo (Correspondence through Millions Bodies: a large-scale, functional, and implicit data-driven method for 3D Humans matching)

Reporting period: 2023-07-01 to 2024-07-31

Geometry surrounds our lives, and we recognize ourselves as part of the 3D world. We inherently interact and reason about it, while its formalization is challenging and has tickled scholars since the
dawn of civilization. The advent of computers and the study of Geometry Processing opened dramatic advancements and possibilities, like simulated surgeons' operations or whole digital universes. Among all the physical entities, the human bodies play a central role. Modern technologies can acquire, digitalize, and imitate human appearance at a stage that is indistinguishable from reality. Interest in Virtual Humans is growing fast in public opinion and several scientific and economic fields, from entertainment to medicine, social sciences to ergonomics. The market of Virtual Avatars has an estimated value of around USD 10 billion and is expected to reach more than USD 520 billion in 2030. However, our understanding can only be complete with tools to establish analogies: what is similar, what is different, or, namely, what is in correspondence. Finding such connections between human geometry is the key enabler for several downstream applications such as virtual try-on, texture transfer, or performing anthropometric statistics. Computer Vision and Graphics have studied the problem intensely since its fundamental and applicative relevance. Still, no method has affirmed itself as a robust and flexible standard to obtain robust and precise 3D correspondence across different humans. Noise, garments, objects, or partiality often pose challenges that require ad-hoc strategies. The CoMBo (Correspondence through Millions Bodies) project aims to fill this gap by combining multiple techniques: relying on a vast human dataset, exploiting flexible geometrical representation, and developing a novel data-driven framework cable to handle this comprehensive set of challenges. CoMBo will produce substantial scientific, economic, and social impacts in this strategic field, carrying an intense outreach to a broad audience, informing on bodies digitalization process and the importance of fair representation of the human experience in this fast-changing technology.
This project's final goal is to obtain a robust and flexible registration pipeline for 3D virtual humans. To achieve this, we focus on three main tasks: collecting a wide data prior, exploiting implicit representations robustness, and deploying and releasing a 3D Human registration pipeline capable of achieving state of art.

1) Data collection: To achieve our goal of a robust starting from a publicly available dataset of motion capture sequences (AMASS). This dataset contains millions of frames of human motion, capturing thousands of actions and sequences. Also, these 3D shapes are encoded as SMPL parameters, providing a natural correspondence across all the shapes. Hence, we first subsample the dataset to around one hundred thousand frames since nearby temporal instants contain similar semantic information. Then, we convert the SMPL parametrs into their 3D model into an implicit representation, in particular as unsigned distance fields. Finally, we store each sample with its original point cloud, which provides a 3D discrete ground truth to the implicit signed distance field.

2) Exploiting Geometric representations: at the start of our analysis, we observed that the advancement of Neural Representations has provided a new set of tools and flexibility. These representations can be queried everywhere in the 3D space, providing precise discrete information while retaining their continuous nature. In particular, we found inspiration in the recent Learned Vertex Descent (LVD) paradigm. LVD aims to train a neural network backbone to predict, for every point in space, the offset toward the ground truth positions of the registration points. In our exploration, we first improve this method by proposing LoVD (Localized Vertex Descent). This new variant includes a geometrical inductive bias to attend to local regions of the bodies. Despite this, we noted that the network could not generalize to geometries significantly out of distribution (e.g. noise, cloths), which may substantially impact the input features, resulting in misalignment between the prediction and the target surface. We consider misalignment between the template registration and the target shape to be trivial errors: the template should generally lie precisely on the target, which is the solution space for the template vertices. Hence, we realized that fine-tuning the backbone neural field to respect this property at inference time would be more efficient than augmenting our dataset. Inspired by the popular iterative registration approaches, we proposed a novel self-supervised task called Neural Iterative Closest Point (NICP), specifically designed to refine Neural Fields at inference time. NICP iteratively queries directly on the target surface and penalizes when no offset is close to zero, i.e. reduces deviations toward misaligned solutions. NICP takes a few seconds and lets the network adapt without requiring ad hoc techniques or expensive data augmentations.

3) 3D Registration Pipeline: Finally, we incorporate LoVD trained on a large dataset with NICP into a complete registration pipeline, also accounting for a Chamfer Distance optimization of the final prediction, together with a local displacement optimization to catch the finest details of the target. We call our method Neural Scalable Registration (NSR), and we validate it on thousands of shapes coming from more than ten different resources, public benchmarks, and several challenges.

Finally, we released our code publicly on GitHub, freely usable for research purposes.
We obtain a robust registration method with unprecedented flexibility, even for cases far from training distribution, including clothes or clutter, poses, and diverse body shapes (e.g. kids). By going far beyond the training data prior, our method opens to a fairer representation of the human diversity experience. This flexibility enables the analysis of data coming from economic and customer-level devices, democratizing access to Virtual Humans technology.

We also obtained promising results on partial scans from single depth view and noisy point clouds from the fusion of multiple Kinects. These are actual data often used in industrial pipelines. We provide a method that can be directly applied to these data sources without costly ad-hoc calibrations and unify the acquisitions coming from disparate sources (e.g. multi-view reconstruction, body scans, artist-made 3D models). We also demonstrate the impact on typical downstream applications, such as texture transfer and automatic animation of avatars.
Results of NSR; Colors encode semantic correspondence obtained by the registration.
My booklet 0 0