Skip to main content
European Commission logo
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS

Perceptual encoding of high fidelity light fields

Periodic Reporting for period 4 - EyeCode (Perceptual encoding of high fidelity light fields)

Período documentado: 2022-01-01 hasta 2023-03-31

The goal of the project is to convey the sense of perceiving real scenes when viewing content on a custom-built electronic display. More broadly, we want to be able to capture, encode and display highly realistic images that go beyond typical 2D images, video or 3D stereo content.

Being able to capture, represent and display visual content is important for several emerging technologies and applications, such as AR/VR, remote operation, remote exploration, telepresence, and entertainment. For example, we want to be able to send robotic drones to places where it is too expensive or too risky to send people (space exploration, deep-sea exploration, disaster areas) and still be able to perceive, experience, and interact in those environments as we were present there.

The problem area is very broad and in this project, we focus on how we can exploit the limitations of our visual system to reduce the amount of data and hardware requirements for a perceptually realistic imaging pipeline. We want to capture, encode, and display only the visual information that is visible to the human eye and ignore anything that is imperceivable.
For clarity, we split the work done into three areas that we investigate on this project

Capture

We were able to build camera systems (rigs) for capturing high dynamic range light fields of both small and large-size scenes. To overcome the limitations of the capture, we explored the existing methods for 3D scene acquisition, from the traditional multi-view 3D stereo to the recent learning-based methods that rely on multi-plane images or neural representations. Our initial investigation found that existing multi-view / light field methods, which do not attempt to recover 3D information, do not offer sufficient quality and data efficiency for our application. Therefore, we started using methods that either attempt to recover depth or rely on the existing depth information from other sources. We also found that the colour accuracy of the existing imaging pipelines is insufficient for our ultra-realistic display. To that end, we developed methods for more accurate high dynamic range merging.

Encoding

We have made substantial progress in terms of efficient encoding of visual content in three domains: temporal, luminance contrast and colour. We came up with a technique, called temporal resolution multiplexing (TRM), that allows displaying smooth motion at high frame-rates while rendering and encoding every second frame at half-the-resolution (https://www.cl.cam.ac.uk/research/rainbow/projects/trm/). This work has been awarded the Best IEEE VR Journal Paper Award in 2019. We also build a comprehensive model of the spatio-chromatic contrast sensitivity (https://www.cl.cam.ac.uk/research/rainbow/projects/hdr-csf/). We plan to use that model to derive an efficient colour representation for HDR data. We also developed machine-learning-based models for predicting visible differences in images, which offer much better prediction accuracy than existing techniques. Those models will be used to align the quality of visual encoding with the perceptual limitations.

Display

We have completed the construction of a high-dynamic-range multi-focal stereo (HDRMF) display, which delivers high brightness (4000 nit), deep blacks, high resolution, stereo disparity, and two focal planes for accommodation and defocus depth cues. Furthermore, the display has a see-through capability, so it is possible to see the displayed images on top of a real-world scene (like in AR displays) or to see the displayed image alone. The display is equipped with an eye-tracking camera, which can provide feedback on the position of the eyes. All this is combined with a real-time 3D rendering algorithm that can deliver images that match the appearance of real scenes.

Efficient perceptual measurements

Our work, to a large degree, relies on perceptual measurements. Since collecting perceptual data typically requires tedious psychophysical experiments, we devoted some effort to new machine-learning techniques to make such measurements as efficient and accurate as possible. For those purposes, we have developed a new active sampling method that lets us collect data in an optimal manner by sampling the points in our problem space that deliver the most information (https://github.com/gfxdisp/asap). The work received the Best Student Paper Award at ICPR 2020.
By the end of the project, we had closed the entire imaging pipeline, from capture to display, and we were able to reproduce images on an electronic display of an unprecedented level of realism. Our HDRMF display delivers impressive images with the fidelity that goes far beyond what can be achieved with existing display technologies. It has already served as a research platform for multiple experiments studying the limits of visual perception.

The project has also resulted in several algorithms and techniques that improve the efficiency of computer graphics rendering (https://www.cl.cam.ac.uk/research/rainbow/projects/alsarr/ https://www.cl.cam.ac.uk/research/rainbow/projects/trm/); visual models (https://www.cl.cam.ac.uk/research/rainbow/projects/stelaCSF/) and metrics (https://www.cl.cam.ac.uk/research/rainbow/projects/fovvideovdp/) that are used in academia and industry; and experimental findings that help to define requirements for future display technologies (https://www.cl.cam.ac.uk/research/rainbow/projects/focus_cues/).
Six-primary HDR display
High-dynamic-range multi-focal stereo display
A camera captures a real-scene box from multiple view-points.