Skip to main content
European Commission logo print header

Holographic Vision for Immersive Tele-Robotic OperatioN

Periodic Reporting for period 2 - HoviTron (Holographic Vision for Immersive Tele-Robotic OperatioN)

Reporting period: 2021-06-01 to 2022-11-30

HoviTron aims at providing Holographic Vision (=Hovi) in Tele-Robotic OperatioN (=Tron) conditions, as shown in Figure 1. A typical application is the remote control by a tele-operator of a robot arm that manipulates objects in hazardous working environments. To this end, a couple of static cameras are set up around the scene of interest, while virtual viewpoints to the scene are synthesized to feed the Head Mounted Display (HMD) that the tele-operator wears. For each of his/her head positions, the corresponding images are synthesized (electronic pan-tilt) to feed the tele-operator’s HMD that provides holographic vision to the scene. This means that the tele-operator’s eyes focus and accommodate correctly to the object of interest he/she is staring at, ensuring visual comfort and less fatigue in already harsh working conditions.
Technically, the holographic HMD provided by CREAL is made of a light field display projecting 32 images with micro-parallax to each eye, i.e. each image shows a slightly different perspective view compared to its adjacent ones, ensuring altogether holographic vision (cf. the prefix “Hovi” in HoviTron).
The system therefore exhibits two levels of parallax: one that is creating stereoscopic views by an advanced depth-based interpolation (aka virtual view synthesis) between camera views (at large inter-camera distance/baseline), and another one that is creating the 32 micro-parallax views within each eye. These two phases correspond respectively to the RaViS/RVS-Vulkan and STALF modules in Figure 2, studied during the WP3 activities. In practice, however, these two modules are intertwined into one-another, providing one large bulk of light field images in real-time to the HMD.
The RaViS/STALF view synthesis process requires a couple of RGBD images, where the depth D should comply to several constraints to ensure high-quality results. In particular (but not limited to), the depth image must be perfectly aligned with the RGB colour image, i.e. the depth discontinuities should perfectly follow the RGB object borders. We refer to it with “RGB-inline Depth” in Figure 2, which is an absolute condition for RaViS/STALF to work properly. Calibration in WP1 addresses this challenge partly.
Unfortunately, most (not to say all) depth sensing devices (Lidar, Kinect, etc) do not comply to this “RGB-inline Depth” constraint; strong countermeasures are then needed to overcome any virtual view synthesis artefact. Depth estimation techniques, on the contrary, where depth is calculated by matching images without active laser light projection can perfectly provide RGB-inline Depth images, but they typically require more input images and/or exhibit serious challenges in reaching real-time operating conditions. Both these cases represent the extreme endpoints of a large spectrum of depth sensing/estimation approaches that are far from straightforward. A plethora of candidate depth sensing/estimation devices considered in HoviTron are presented at the left side of Figure 2, briefly discussed in WP2.
Noteworthy, the far-top and far-bottom devices in the left column of Figure 2, i.e. DERS and RayTrix, were chosen as a starting point in HoviTron, since they were both considered in the MPEG Immersive Video (MIV) standardization activities HoviTron is inspired from. DERS is to be applied on conventional cameras, while RayTrix is a representative of so-called plenoptic cameras that were anticipated to enter the MIV activities before project submission and did so halfway the first year of the HoviTron project.
RayTrix being real-time by design, it is logical to consider it as a potential candidate for HoviTron, while DERS developed within MIV has to be accelerated to reach real-time performances, cf. the more detailed discussion in WP2. In this tedious study with unexpected hurdles, many depth sensing/estimation alternatives popped up, some being abandoned, others probably still being viable solutions (though the component shortage in the semi-conductor industry resulting from the Covid-19 pandemic has jeopardized this). To mitigate the risks, two of them have been integrated in the Proof-of-Concept (PoC) of WP4, demonstrated in a robotic environment (cf. the suffix “Tron” in HoviTron).
Though we are confident that the RayTrix solution is more prone to reach top-quality results (only achieved after finalization of the project, due to RayTrix calibration problems), we are nevertheless targeting a consumer/prosumer solution with RGB-inline Depth cameras of 500-1000€ each, which is more than an order of magnitude cheaper than RayTrix. In the first half of the project, we reached satisfactory results with a Lidar & DERS acceleration (aka GoRG), showing promising quality-runtime-cost trade-offs. Mid-way the project, however, we had to find an alternative to the L515 lidar approach, because the device was discontinued. We finally opted for the Azure Kinect, and had to develop a Kinect depth Refinement Tool (KiRT) to overcome Kinect's pitfalls for the envisaged virtual view synthesis functionalities. Together with the integration activities of WP4, depth refinement got most of our attention in the second half of the project. Eventually, the end-to-end tele-robotic system with electronic pan-tilt HMD could be set up at a TRL of 6-7, additionally validating its superiority over state-of-the-art with a final User Study (US-2 in WP5) applied on the final Proof-of-Concept (PoC-3 in WP4). Such technology has a true impact on SME with telerobotic operation for production/service/maintenance. Moreover, the same technology got the attention of the Digital Video Broadcasting (DVB) consortium, and might be a good candidate for video-based VR in the metaverse through the Focus Group on MetaVerse that has been set up (outside the HoviTron consortium; rather in collaboration with MPEG) mid-December 2022 (with a kick-off in March 2023).
Finally, a cornerstone of the project is to verify that all the developed technology indeed reaches the holographic vision target (cf. the prefix “Hovi” in HoviTron). The user studies US-1 in WP5 have already shown that this goal is achieved when the depth images are perfect, i.e. obtained by raytracing from synthetic content. This validates the merits of the RaViS/STALF approach for holographic vision. The remaining challenge is to confirm these findings on the HoviTron PoC that will use a real scene, rather than a computer synthesized one. Indeed, depth artefacts will most probably occur, but to which degree and which countermeasures might be required was still an open question, halfway the project. User Study US-2 was conclusive in this respect, validating the added-value of the HoviTron technology showcased in the final Proof-of-Concept (PoC-3).
HoviTron's final PoC
HoviTron's core technologies
HoviTron's light field refocusing capability in robotic application