Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

FAst and energy efficient Learned image and video CompresiON

Periodic Reporting for period 1 - FALCON (FAst and energy efficient Learned image and video CompresiON)

Reporting period: 2022-03-01 to 2024-02-29

The ever-increasing demand for image and video-based applications motivates innovations to improve multimedia compression performance, and support modern multimedia systems. The emerging solutions exploit the power of Machine Learning to enable higher image/video compression rates, and higher quality for such systems. However, these solutions come with a high computational complexity, which translates to higher energy consumption, difficulty of deployment in services and consumer devices, and higher carbon footprint. Multimedia services are among the most demanding applications. Video-based traffic constitutes around 80% of the Internet traffic, and is responsible for around 1% of the global greenhouse gas emission. Hence, it is essential to develop solutions to enhance the compression efficiency of compression systems, while reducing the computational complexity to comply with important energy consumption policies.
To this end, this project, FALCON, studies novel solutions for fast and energy efficient learning-based compression, that support modern multimedia systems. The overall objective is to improve the compression and quality, while reducing the complexity for faster solutions. The objectives of the project are achieved through the following main directions:
(1) Designing fast and efficient methods for multimedia and compression systems. This lowers the complexity, while trying to keep the compress/quality.
(2) Designing advanced learning-based methods for multimedia and compression systems. This enable higher compression efficiency, with similar complexities.
(3) Optimizing compression systems based on human psycho-vision. This removes unnecessary information or operations that cannot be distinguished by human observers.
The work performed in this project has led to eleven outputs. A summary of these works and their main results is provided here.
A comprehensive investigation of complexity in recent learning-based compression methods is reported in “Comprehensive Complexity Assessment of Emerging Learned Image Compression on CPU and GPU”. This quantifies important aspects of complexity, and guides the complexity-aware development of these methods.
“MAMIQA: No-Reference Image Quality Assessment Based on Multiscale Attention Mechanism with Natural Scene Statistics” proposes a low complexity learning-based approach for image quality assessment. The proposed method divides the complex neural network operations into simpler operations to lower the complexity. This leads to significantly reduced complexity, compared to the state-of-the-art.
A fast approximate method is proposed in “Efficient Bitrate Ladder Construction using Transfer Learning and Spatio-Temporal Features”, to develop the bitrate ladder construction, avoiding a full parameter search. The proposed method estimates the optimal encoding parameters with reduced number of encodings. It also reduces the complexity by leveraging transfer learning, and by dividing the complex prediction network into simpler temporal and spatial modules.
“Channel-wise Feature Decorrelation for Enhanced Learned Image Compression” proposes a method to improve the performance of learning-based compression, without increasing the complexity. This enables an alternative to improving performance by employing deeper and more complex models. This is achieved via guiding the compression networks to learn a more diverse set of features, leading to improved compression performance. The proposed method significantly improves compression with no increase of the complexity.
Optimization of video compression for machine vision systems is proposed in “NCOD: Near-Optimum Video Compression for Object Detection”. Machine vision algorithms do not correspond to the human understanding of video quality. The proposed method uses this fact to effectively reduce the bitrate, via designing a fast mechanism to find the best codec operating point for object detection.
In “Joint End-to-End Image Compression and Denoising: Leveraging Contrastive Learning and Multi-Scale Self-ONNs”, an optimization approach is proposed that optimizes the learned compression jointly for compression efficiency, and denoising capability. Using fast Operational Neural Networks and contrastive learning, the proposed method learns to remove the noise and keep the image content, leading to a better compression efficiency.
In “Pixel-Wise Color Constancy via Smoothness Techniques in Multi-Illuminant Scenes” a solution is proposed to solve the problem of color constancy, when multiple illumination sources affect the imaging quality. The proposed method enforces smoothness techniques, to remove the illuminant colors and restores the natural image colors.
The proposed method in “Panoramic Image Inpainting with Gated Convolution and Contextual Reconstruction Loss”, restores impaired image areas through inpainting. A GAN-based method is proposed that uses masked convolutions to isolate the flow of information only from valid regions, and finds the most suitable regions via contextual loss.
The work in “MTJND: Multi-Task Deep Learning Framework for Improved JND Prediction” enables accurate modeling of Just Noticeable Difference (JND), to be used for optimization of video compression based on human visual systems. As JND modeling is a complex task, a multi-task framework is proposed that jointly learns two similar tasks, leading to improved learning performance.
“Lightweight Multitask Learning for Robust JND Prediction using Latent Space and Reconstructed Frames” builds upon the above-mentioned work, to enable JND estimation in learned compression setting. Lightweight networks are designed to estimate JND either based on the decoded frames, or directly from the compressed frames, leading to a faster solution. This approach significantly reduces video bitrate without loss of visual quality.
Finally, “Perceptual Learned Image Compression via End-to-End JND-Based Optimization” presents an optimization framework to integrate JND into learned image compression. Three loss functions are designed and tested that guide the network to learn visually salient features and discard unobservable details. This approach saves significant bitrate and improves visual quality.
FLACON pushes the field of learning-based compression, and multimedia systems, beyond the SoTA, in multiple ways:
(1) It enables improving compression efficiency without adding to the complexity.
(2) It develops a robust methodology to optimize learning-based compression based on human visual system.
(3) It develops low complexity learning-based compression and multimedia systems, via breaking down complex operations, augmenting task-relevant features, and lean networks.
The results of the project will impact multimedia systems via promoting low complexity solutions, and perceptual optimizations. The performed complexity analysis of learned compression raises awareness and guidelines for further efficient development of such systems. The proposed solutions constitute an important step in lowering the cost and energy consumption of multimedia systems, leading to more scalable multimedia services and lowered carbon footprint.
The proposed JND optimized learned image compression framework