Skip to main content
European Commission logo print header

Perceptual encoding of high fidelity light fields

Periodic Reporting for period 3 - EyeCode (Perceptual encoding of high fidelity light fields)

Período documentado: 2020-07-01 hasta 2021-12-31

The goal of the project is to convey the sense of perceiving real scenes when viewing content on a custom-build electronic display. More broadly, we want to be able to capture, encode and display highly realistic images that go beyond typical 2D images, video or 3D stereo content.

Being able to capture, represent and display visual content is important for several emerging technologies and applications, such as AR/VR, remote operation, remote exploration, telepresence and entertainment. For example, we want to be able to send robotic drones to the places where it is too expensive or too risky to send people (space exploration, deep-sea exploration, disaster areas) and still be able to perceive, experience and interact in those environments as we were present there.

The problem area is very broad and in this project, we focus on how we can exploit the limitations of our visual system to reduce the amount of data and hardware requirements for perceptually realistic imaging pipeline. We want to capture, encode and display only the visual information that is visible to the human eye and ignore anything that is imperceivable.
For clarity, we split the work done into three areas that we investigate on this project

Capture

We were able to build camera systems (rigs) for capturing high dynamic range light fields of both small and large size scenes. To overcome the limitations of the capture, we explored the existing methods for 3D scene acquisition, from the traditional multi-view 3D stereo to multi-plane-images. Our initial investigation found that existing multi-view / light field methods, which do not attempt to recover 3D information, do not offer sufficient quality and data efficiency for our application. Therefore, our current focus is on combining 3D depth cameras with high-quality colour cameras to capture multi-view images with per-pixel depth (RGBD). Those images can be used to reproject 3D data for an arbitrary view with sufficient quality.

Encoding

We have made substantial progress in terms of efficient encoding of visual content in three domains: temporal, luminance contrast and colour. We came up with a technique, called temporal resolution multiplexing (TRM), that allows displaying smooth motion at high frame-rates while rendering and encoding every second frame at half-the-resolution (https://www.cl.cam.ac.uk/research/rainbow/projects/trm/). This work has been awarded Best IEEE VR Journal Paper Award in 2019. We also build a comprehensive model of the spatio-chromatic contrast sensitivity (https://www.cl.cam.ac.uk/research/rainbow/projects/hdr-csf/). We plan to use that model to derive an efficient colour representation for HDR data. We also developed machine-learning-based models for predicting visible differences in images, which offer much better prediction accuracy than existing techniques. Those models will be used to align the quality of visual encoding with the perceptual limitations.

Display

We have completed the construction of high-dynamic-range multi-focal stereo (HDRMF) display, which delivers high brightness (4000 nit), deep blacks, high resolution, stereo disparity and two focal planes for accommodation and defocus depth cues. Furthermore, the display has a see-through capability, so it is possible to see the displayed images on top of a real-world scene (like in AR displays) or to see the displayed image alone. The display is equipped with an eye-tracking camera, which can provide feedback on the position of the eyes. We started work on the rendering algorithm that can deliver images matching the appearance of the real-scene.

Efficient perceptual measurements

Our work in large degree relies on perceptual measurements. Since collecting perceptual data typically requires tedious psychophysical experiments, we devoted some effort to use machine learning techniques to make such measurements as efficient and accurate as possible. For those purposes, we have developed a new active sampling method that lets us collect data in an optimal manner by sampling the points in our problem space that deliver the most information.
We expect that by the end of the project we will be able to close the entire imaging pipeline, from capture to display and be able to reproduce images on an electronic display of an unprecedented level of realism. Our current HDRMF display already delivers impressive images, far beyond what can be achieved with existing display technologies.

But more importantly, we want to provide several recommendations for efficient capture, rendering, coding and display of highly realistic images. We have already provided a novel method of reducing data size by up to 40% in the temporal domain (TRM). We want to use our contrast sensitivity models and machine-learning-based visual metrics to optimize the efficiency of visual content.
Six-primary HDR display
High-dynamic-range multi-focal stereo display
A camera captures a real-scene box from multiple view-points.