Saliency-aware High-resolution Video Processing

Final Report Summary - SHIVPRO (Saliency-aware High-resolution Video Processing)

As the return phase of the SHIVPRO project, we have successfully set up the long-term close collaboration between the School of Communication and Information Engineering, Shanghai University, China and the SIROCCO Team, IRISA/INRIA-Rennes, France, with the support by EU FP7 Marie Curie Actions. We further investigated more effective spatiotemporal saliency models for challenging videos with complicated motion and the related applications such as salient object detection and video retargeting, and also investigated the effective methods of visual scanpath prediction and saliency aggregation to facilitate a number of interesting saliency-aware applications such as camera autofocus, scene understanding and saliency manipulation in images and videos. The significant results we have achieved are described as follows:

In order to effectively elevate the saliency detection performance on challenging videos with complicated motions, we proposed a novel spatiotemporal saliency model based on superpixel-level trajectories. The input video is first decomposed into a set of temporally consistent superpixels, on which superpixel-level trajectories are directly generated, and motion histograms at superpixel level as well as frame level are extracted. Based on motion vector fields of multiple successive frames, the inside-outside maps are estimated to roughly indicate whether pixels are inside or outside objects with motion different from background. Then two descriptors, i.e. accumulated motion histogram and trajectory velocity entropy, are exploited to characterize the short-term and long-term temporal features of superpixel-level trajectories. Based on trajectory descriptors and inside-outside maps, superpixel-level trajectory distinctiveness is evaluated and trajectory classification is performed to obtain trajectory-level temporal saliency. Superpixel-level and pixel-level temporal saliency maps are generated in turn by exploiting motion similarity with neighboring superpixels around each trajectory, and color-spatial similarity with neighboring superpixels around each pixel, respectively. Finally, a quality-guided fusion method is proposed to integrate the pixel-level temporal saliency map with the pixel-level spatial saliency map, which is generated based on global contrast and spatial sparsity of superpixels, to generate the pixel-level spatiotemporal saliency map with reasonable quality. Experimental results on two public video datasets demonstrate that the proposed model outperforms the state-of-the-art spatiotemporal saliency models on saliency detection performance.

On the basis of spatiotemporal saliency maps, an effective spatiotemporal salient object detection method, which maximizes both saliency density and saliency gross inside the detection window as well as the saliency difference between the detection window and its outside region, is exploited to detect the spatiotemporal salient objects via the use of effective sub-window search algorithm. Then both cropping and scaling operations are jointly performed based on the detected spatiotemporal salient object regions to generate the retargeted video effectively. Experimental results on a variety of videos demonstrate that the proposed approach is simple yet effective, and achieves the better retargeting performance.

We proposed a new framework to predict visual scanpaths of observers when they freely watch a visual scene. The visual fixations are inferred from bottom-up saliency and several oculomotor biases. Bottom-up saliency is represented by a saliency map whereas the oculomotor biases (saccade amplitudes and saccade orientations) are modeled using several public eye tracking datasets. Experimental results show that the simulated scanpaths exhibit similar trends of human eye movements in a free-viewing condition. The generated scanpaths are more similar to human scanpaths than those generated by two existing methods. In addition, experimental results also show that computing saliency maps from the simulated visual scanpaths outperforms the existing saliency models.

We investigated whether the aggregation of saliency maps allows outperforming the best saliency models. We used various aggregation methods including six unsupervised and four supervised learning methods and tested on two existing eye fixation datasets. Results show that a simple average of the saliency maps generated by the top two saliency models significantly outperforms the best saliency models. Concerning the supervised learning methods, we provide evidence that it is possible to further increase the performance, under the condition that an image similar to the input image can be found in the training dataset. Our results might have an impact for critical applications which require robust and relevant saliency maps.

Besides, the researcher of this project, Prof. Zhi Liu (Shanghai University) served as the managing guest editor, with the other three guest editors, i.e. Prof. Olivier Le Meur (IRISA/University of Rennes 1), Prof. Ali Borji (University of Wisconsin) and Prof. Hongliang Li (University of Electronic Science and Technology of China), for the special issue “Recent Advances in Saliency Models, Applications and Evaluations”, which has been published on the EURASIP journal, Signal Processing: Image Communication (volume 38, Oct. 2015). This special issue assembles 12 papers, in which the first six papers propose saliency models for predicting human fixations in images, videos and stereoscopic images, the following three papers propose saliency models for salient object detection/segmentation in images, videos and RGBD/stereoscopic images, and the last three papers demonstrate the use of saliency in the related applications including video compression, image GPS location estimation and viewpoint selection, respectively. More details about this special issue can be found at http://www.sciencedirect.com/science/journal/09235965/38.

We believe that the research results of this project and the above special issue will effectively promote the research on saliency models and saliency-based applications, strengthen the international research collaborations especially between Europe and China, and significantly increase the visibility of our research as well as the Marie Curie Actions. The website for the SHIVPRO project is http://people.irisa.fr/Olivier.Le_Meur/shivpro/.

Final Report Summary - SHIVPRO (Saliency-aware High-resolution Video Processing)

Share this page Share this page on social networks

Download Download the content of the page