Periodic Reporting for period 2 - HoviTron (Holographic Vision for Immersive Tele-Robotic OperatioN)
Berichtszeitraum: 2021-06-01 bis 2022-11-30
The system therefore exhibits two levels of parallax: one that is creating stereoscopic views by an advanced depth-based interpolation (aka virtual view synthesis) between camera views (at large inter-camera distance/baseline), and another one that is creating the 32 micro-parallax views within each eye. These two phases correspond respectively to the RaViS/RVS-Vulkan and STALF modules in Figure 2, studied during the WP3 activities. In practice, however, these two modules are intertwined into one-another, providing one large bulk of light field images in real-time to the HMD.
The RaViS/STALF view synthesis process requires a couple of RGBD images, where the depth D should comply to several constraints to ensure high-quality results. In particular (but not limited to), the depth image must be perfectly aligned with the RGB colour image, i.e. the depth discontinuities should perfectly follow the RGB object borders. We refer to it with “RGB-inline Depth” in Figure 2, which is an absolute condition for RaViS/STALF to work properly. Calibration in WP1 addresses this challenge partly.
Unfortunately, most (not to say all) depth sensing devices (Lidar, Kinect, etc) do not comply to this “RGB-inline Depth” constraint; strong countermeasures are then needed to overcome any virtual view synthesis artefact. Depth estimation techniques, on the contrary, where depth is calculated by matching images without active laser light projection can perfectly provide RGB-inline Depth images, but they typically require more input images and/or exhibit serious challenges in reaching real-time operating conditions. Both these cases represent the extreme endpoints of a large spectrum of depth sensing/estimation approaches that are far from straightforward. A plethora of candidate depth sensing/estimation devices considered in HoviTron are presented at the left side of Figure 2, briefly discussed in WP2.
Noteworthy, the far-top and far-bottom devices in the left column of Figure 2, i.e. DERS and RayTrix, were chosen as a starting point in HoviTron, since they were both considered in the MPEG Immersive Video (MIV) standardization activities HoviTron is inspired from. DERS is to be applied on conventional cameras, while RayTrix is a representative of so-called plenoptic cameras that were anticipated to enter the MIV activities before project submission and did so halfway the first year of the HoviTron project.
RayTrix being real-time by design, it is logical to consider it as a potential candidate for HoviTron, while DERS developed within MIV has to be accelerated to reach real-time performances, cf. the more detailed discussion in WP2. In this tedious study with unexpected hurdles, many depth sensing/estimation alternatives popped up, some being abandoned, others probably still being viable solutions (though the component shortage in the semi-conductor industry resulting from the Covid-19 pandemic has jeopardized this). To mitigate the risks, two of them have been integrated in the Proof-of-Concept (PoC) of WP4, demonstrated in a robotic environment (cf. the suffix “Tron” in HoviTron).
Though we are confident that the RayTrix solution is more prone to reach top-quality results (only achieved after finalization of the project, due to RayTrix calibration problems), we are nevertheless targeting a consumer/prosumer solution with RGB-inline Depth cameras of 500-1000€ each, which is more than an order of magnitude cheaper than RayTrix. In the first half of the project, we reached satisfactory results with a Lidar & DERS acceleration (aka GoRG), showing promising quality-runtime-cost trade-offs. Mid-way the project, however, we had to find an alternative to the L515 lidar approach, because the device was discontinued. We finally opted for the Azure Kinect, and had to develop a Kinect depth Refinement Tool (KiRT) to overcome Kinect's pitfalls for the envisaged virtual view synthesis functionalities. Together with the integration activities of WP4, depth refinement got most of our attention in the second half of the project. Eventually, the end-to-end tele-robotic system with electronic pan-tilt HMD could be set up at a TRL of 6-7, additionally validating its superiority over state-of-the-art with a final User Study (US-2 in WP5) applied on the final Proof-of-Concept (PoC-3 in WP4). Such technology has a true impact on SME with telerobotic operation for production/service/maintenance. Moreover, the same technology got the attention of the Digital Video Broadcasting (DVB) consortium, and might be a good candidate for video-based VR in the metaverse through the Focus Group on MetaVerse that has been set up (outside the HoviTron consortium; rather in collaboration with MPEG) mid-December 2022 (with a kick-off in March 2023).