CORDIS - Wyniki badań wspieranych przez UE
CORDIS

Saliency-aware High-resolution Video Processing

Final Report Summary - SHIVPRO (Saliency-aware High-resolution Video Processing)

The objective of this project is to propose an efficient spatiotemporal saliency model to predict salient regions in high-resolution videos, and exploit the proposed saliency model to ease the design and improve the performance of high-resolution video compression, and retargeting applications, which will be investigated in the return phase.
According to the Annex I of the Grant Agreement, we have completed the three tasks: spatiotemporal saliency model based on multiscale region representation (9 months), efficient spatiotemporal salient object detection (6 months), and saliency-aware high-resolution video compression (9 months). Specifically, we have investigated region-based features for effective saliency measure, exploited saliency-directed region merging to realize multiscale region representation. We proposed the saliency tree model to achieve a higher saliency detection performance, the superpixel based spatiotemporal saliency model for both saliency detection and human fixation prediction in videos, and the hierarchical segmentation based co-saliency models for detecting common salient objects in a set of images/video frames. On the basis of spatiotemporal saliency maps, we proposed an effective spatiotemporal salient object detection method. Under the current framework of High Efficiency Video Coding (HEVC), we proposed a new intra prediction mode based on inpainting and template matching to improve the coding efficiency, and an adaptive inter-mode decision method by jointly utilizing inter-level and spatiotemporal correlations to reduce the computational complexity.
The significant results we have achieved are described in the following:
1) As the basis of a region-based saliency model, we investigated the region-based feature, i.e. regional histogram, and proposed an efficient regional histogram based saliency model. First, the global histogram is constructed by performing an adaptive color quantization on the original image. Then multiple regional histograms are built on the basis of the region segmentation result, and the color–spatial similarity between each pixel and each regional histogram is calculated accordingly. Two efficient measures, distinctiveness and compactness of each regional histogram, are evaluated based on the color difference with the global histogram and the color distribution over the whole image, respectively. Finally, the pixel-level saliency map is generated by integrating the color–spatial similarity measures with the distinctiveness and compactness measures. Experimental results on a public dataset containing 1000 test images with ground truths demonstrate that the proposed saliency model consistently outperforms state-of-the-art saliency models.
2) By exploiting multiscale region representation of images/video frames, we proposed saliency tree as a novel saliency detection framework. For the efficiency of saliency measurement, the original image/video frame is first simplified by performing adaptive color quantization and region segmentation to generate primitive regions. Then the initial regional saliency measure of each primitive region is evaluated based on its global contrast, spatial sparsity and object prior with the integration of color-spatial similarity measures between regions. Next, a saliency-directed region merging approach with dynamic scale control scheme is proposed to generate the saliency tree, which enables an efficient multiscale region representation of saliency. In the saliency tree, each leaf node represents a primitive region and each non-leaf node represents a non-primitive region generated during the region merging process. Finally, using a regional center-surround scheme based node selection criterion, a systematic saliency tree analysis including salient node selection, regional saliency adjustment and selection is performed to obtain final regional saliency measures and to derive the high-quality pixel-wise saliency map. Extensive experimental results on five datasets with pixel-wise ground truths demonstrate that the proposed saliency tree model consistently outperforms the state-of-the-art saliency models.
3) Based on the above research results, we proposed an efficient superpixel-based spatiotemporal saliency model. On the basis of superpixel representation of video frames, motion histograms and color histograms are extracted at superpixel level as local features and frame level as global features. Then superpixel-level temporal saliency is measured by integrating motion distinctiveness of superpixels with a scheme of temporal saliency prediction and adjustment, and superpixel-level spatial saliency is measured by evaluating global contrast and spatial sparsity of superpixels. Finally, a pixel-level saliency derivation method is used to generate pixel-level temporal saliency map and spatial saliency map, and an adaptive fusion method is exploited to integrate them into the spatiotemporal saliency map. Experimental results on two public video datasets and eight high-resolution videos demonstrate the better performance of our model, in terms of saliency detection and human fixation prediction, compared to the state-of-the-art spatiotemporal saliency models.
4) We investigated an emerging and interesting issue in saliency detection, i.e. co-saliency detection, which aims to discover the common salient objects in a set of images. Co-saliency models can be effectively used for salient object detection in videos, by exploiting the strong spatiotemporal correlations among video frames. Specifically, the proposed co-saliency model is featured with hierarchical segmentation. On the basis of fine segmentation, regional histograms are used to measure regional similarities between region pairs in the image set, and regional contrasts within each image are exploited to evaluate the intra-saliency of each region. On the basis of coarse segmentation, an object prior for each region is measured based on the connectivity with image border. Finally, the global similarity of each region is derived based on regional similarity measures, and then effectively integrated with intra-saliency map and object prior map to generate the co-saliency map for each image. Experimental results on two benchmark datasets demonstrate the better co-saliency detection performance of the proposed model compared to the state-of-the-art co-saliency models.
5) On the basis of spatiotemporal saliency maps in a video shot, we proposed an effective spatiotemporal salient object detection method, which uses rectangular windows to locate spatiotemporal salient objects. To guide the detection of salient objects, the objective function for maximization incorporates multiple factors including saliency density and saliency gross inside the candidate window, and saliency difference between the candidate window and its outside complement region. An efficient sub-window search algorithm with early termination scheme is exploited to speedup the search process of candidate windows. To enhance the temporal coherence of the detected windows among multiple consecutive frames, the dynamic programming algorithm is exploited to adjust the centers and sizes of the detected windows in multiple frames. Experimental results on various video sequences demonstrate the better detection performance of the proposed method.
6) HEVC adopts the quadtree structured coding unit (CU) to allow recursive splitting into four equally sized blocks. At each depth level (or CU size), it enables up to 35 intra prediction modes including a planar mode, a DC mode and 33 directional modes. We proposed a new intra prediction mode based on inpainting and template matching, in addition to the existing 35 modes, to further improve the coding efficiency. Inspired by our previous work on saliency models and image/video inpainting, we found that template matching, which predicts the entire prediction unit (PU) in one step with the neighboring, is more effective for regions with homogenous texture, while exemplar-based inpainting is more effective for PUs with non-homogenous texture. Using such a combination of two methods for intra prediction, the degradation of saliency on the predicted PUs is negligible, and the coding efficiency is effectively improved. Experimental results on six classes of high-resolution video sequences show that with the introduction of the proposed new intra prediction mode, the BD-rate decreases by 0.8% on average, with the maximum of 3.0% on screen content videos (class F).
7) In HEVC, the mode decision process is performed using all possible depth levels (or CU sizes) and prediction modes to find the one with the least rate distortion (RD) cost using Lagrange multiplier. At each depth level, HEVC enables SKIP mode, Merge mode, Inter 2N×2N, Inter 2N×N, Inter N×2N, Inter 2N×nU, Inter 2N×nD, Inter nL×2N, Inter nR×2N, Inter N×N (only available for the smallest CU), Intra 2N×2N and Intra N×N (only available for the smallest CU) in inter frames. This achieves the highest coding efficiency, but leads to a very high computational complexity. Since the optimal prediction mode is highly content-dependent, it is not efficient to test all modes. Therefore, we proposed a fast inter mode decision algorithm for HEVC by jointly utilizing the inter-level correlation of quadtree structure and the spatiotemporal correlation. We found that strong correlations of prediction mode, motion vector and RD cost exist between different depth levels and between spatiotemporally adjacent CUs. We statistically analyzed the prediction mode distribution at each depth level and the correlation of coding information among adjacent CUs. Based on the analysis results, we proposed three adaptive inter-mode decision strategies including early SKIP mode decision, prediction size correlation based mode decision and RD cost correlation based mode decision. Experimental results show that the proposed overall algorithm can save 49%-52% computational complexity on average with negligible loss of coding efficiency, exhibiting applicability to various types of video sequences.
From Aug. 2012 to Aug. 2014, we have conducted the following researcher training activities, transfer of knowledge activities and integration activities:
(1) Special Issue: Dr. Zhi Liu and Dr. Olivier Le Meur proposed a special issue to a EURASIP journal, Signal Processing: Image Communication, and currently serve as the guest editors (with Dr. Ali Borji and Dr. Hongliang Li) for the special issue on “Recent Advances in Saliency Models, Applications and Evaluations”, which will appear in Signal Processing: Image Communication. The call for paper of the special issue has been posted at http://www.journals.elsevier.com/signal-processing-image-communication/call-for-papers/special-issue-on-recent-advances-in-saliency-models-applicat/ since July 2014.
(2) Special session: Dr. Olivier Le Meur and Dr. Zhi Liu organized a special session "Visual attention, a multidisciplinary topic: from behavioral studies to computer vision applications" (5 papers) at the 14th International Workshop on Image and Audio Analysis for Multimedia Interactive Services (WIAMIS'2013), which was held in Paris, 3-5 July, 2013. More details can be found at http://wiamis2013.wp.mines-telecom.fr. Dr. Zhi Liu, Dr. Olivier Le Meur and Dr. Xiang Zhang organized a special session “Visual saliency: emerging models and applications in multimedia processing” (5 papers) at International Conference on Multimedia and Expo (ICME’2014), which was held in Chengdu, China, 14-18 July, 2014. More details can be found at http://www.icme2014.org/.
(3) Talk: Dr. Zhi Liu presented the talk "Salient object detection and segmentation" at IRISA/INRIA-Rennes (Nov. 2012), INSA-Rennes (May 2013), and LUTIN Lab, University of Paris 8 (Oct. 2013), respectively. From April to July 2014, Dr. Zhi Liu was invited to present the talk “High-performance region-based saliency detection for images and videos” on the basis of the research results of this project, at various universities and research institutes including Technicolor-Rennes, IRCCyN Lab (University of Nantes), IRISA/INRIA-Rennes, ParisTech, Shanghai Jiaotong University, Fudan University, Zhejiang University of Technology, University of Electronic Science and Technology of China, and Sichuan University. In July 2014, Dr. Olivier Le Meur was invited to present the talk “Examplar-based inpainting method” at University of Electronic Science and Technology of China, and Sichuan University. Dr. Zhi Liu and Dr. Olivier Le Meur fully discussed with the professors, post-docs, PhD/Master students in those labs they visited in the above universities and research institutes, for academic exchanges and research collaborations, and also introduced the Marie Curie Actions and Horizon 2020 to the audiences.
(4) Student training: An intern from the University of Rennes 1, Judikaël Guezingar, was supervised by Dr. Zhi Liu and Dr. Olivier Le Meur, from June to September 2013. The goal was to design a user-friendly interface using the new methods developed in the SHIVPRO project. Specifically, the saliency tree model has been implemented in C++ source and was used to extract object of interest. This software will serve as a demonstrator of the main achievements of our project. Beyond that, we intend to add new functionalities in this software such as color harmonization, inpainting, background subtraction, etc. A Master student from the University of Rennes 1, David Gommelet, was supervised by Dr. Zhi Liu and Dr. Olivier Le Meur, from November 2013 to March 2014. The goal was to design a new intra prediction mode for HEVC to improve its coding efficiency and to collect eye fixation data on high-resolution videos. On the HEVC reference software HM12.0 the proposed intra prediction mode was implemented efficiently and demonstrated to improve the coding efficiency of HEVC on all classes of test sequences.
(5) Website: The project website (http://people.irisa.fr/Olivier.Le_Meur/shivpro/) was set up, and the research results of this project and other related materials were made publicly available on the website. Dr. Liu and Dr. Le Meur will keep the website updated in the future.
The research results of this project will effectively promote the research on saliency models, advance the saliency based image and video processing technology, and facilitate the development of intelligent high-resolution video services. The activities conducted under the support of this project will strengthen the research collaborations between Europe and China, and significantly increase the visibility of our research as well as the Marie Curie Actions.