This site has been archived on
All Issues Table of Contents Acrobat PDF version

TAPESTRIES: The Application of Psychological Evaluation to Systems and Technologies in Remote Imaging and Entertainment Systems

Goals
Achievements of the Project
Impact on MPEG 4 Technology
References

Goals

Image The introduction of digital distribution networks makes it possible to deliver a diverse range of multimedia services to the home. These services will extend the boundaries of conventional television broadcasting to include: interactive services, Internet access, 3D television, and immersive television. The use of digital compression and range of applications requires the development of specially adapted quality assessment methods for the evaluation of these services.

The main objectives of the TAPESTRIES project were to develop both subjective and objective quality assessment procedures for the performance evaluation of digital multimedia services. A major goal of the project was to develop an automated quality assessment tool that is able to track the scene dependent quality variations found in MPEG-2 encoded television pictures. The aim being to provide a system that will allow operators to monitor the quality of their delivered services on a continuous basis and also provide a means to compare the performance of different commercially available MPEG-2 encoders.

A further objective of the project was to employ the Single-stimulus quality assessment procedure (SSCQE), developed in the earlier RACE MOSAIC project, to determine the minimum encoded bit-rate requirements for different types of thematic programme material such as Entertainment, Cartoons, and Sports.

TAPESTRIES also set itself the objective of developing a test methodology for 3D television services that is able to measure enhanced experience of viewing in 3D and unwanted effects such as viewer headaches and eyestrain.

Finally, an important goal of the project was to support the MPEG-4 testing process, by organising and conducting competition and verification tests. To this aim new test methodologies, suited to evaluate MPEG-4 functionalities, were also developed by the project. Test support was also provided to other ACTS projects to assist with the performance assessment of their developed systems.

Achievements of the Project

Image

New standards for the subjective assessment of digital multimedia services

In the earlier RACE MOSAIC project, a Single Stimulus Continuous Quality Evaluation (SSCQE) procedure was developed to allow observers to continuously rate the time varying quality of digitally compressed images. The procedure uses hand-held sliders connected to a computer analysis system. The development of this analysis contributed much new knowledge about the way that humans' judge compressed digital picture quality and in particular the powerful role human memory plays in the process.

The SSCQE methodology has been further developed and refined during the TAPESTRIES project. Experiments made by TAPESTRIES found that the stability of SSCQE assessment results were affected by the inconsistency of certain observers. A process to reject results from inconsistent observers was developed for the SSCQE test protocol and is now included in a revision of the ITU-R BT.500-8 recommendation [1]. Experiments were also made to investigate the relationship between the SSCQE and the conventional Double Stimulus Continuous Quality Scale (DSCQS) methodology. Results concluded that it was possible to relate SSCQE results to the DSCQS scale and this procedure (known as Stage 2 of the SSCQE methodology) has been submitted for inclusion in ITU-R recommendations. TAPESTRIES also proposed a method (known as Stage 3 of the SSCQE methodology) to derive a single quality grade from SSCQE results.

An adaptation of the SSCQE methodology was jointly proposed by the ACTS projects TAPESTRIES and MOMUSYS to the MPEG Test Subgroup for the evaluation of video codec error robustness. This is an important parameter for mobile communication networks where due to varying propagation conditions transmission-errors may be high. This method has been applied in MPEG-4 verification tests and it is expected to be included in ITU recommendations on subjective video quality assessment [2, 3]. TAPESTRIES and MoMuSys have also proposed to MPEG modifications of standard assessment methods to adapt them to the evaluation of content-based coding schemes. These modifications, are also expected to be included in ITU recommendations [2]. Finally, TAPESTRIES has actively participated in MPEG testing activities and has helped MPEG-4 developers in the evaluation of both new coding schemes and MPEG-4 standards performance.

TAPESTRIES also provided subjective quality assessment support to other ACTS projects for the evaluation of their developed systems and services and through this collaborative work has had a wide impact on the results of the ACTS programme.

Subjective quality assessment of MPEG-2 broadcast television services

Using the SSCQE methodology TAPESTRIES developed an understanding of the relationship between bit-rate and service quality for different thematic types of programme material such as Entertainment, Cartoons, and Sports. These quality assessment experiments were made using 240 observers and concluded that: The SSCQE methodology was also used to evaluate the performance improvements that can be achieved using statistical multiplexing (variable bit rate) rather than conventional constant bit-rate encoding for MPEG-2 encoded television services. The statistical multiplexer approach relies on the fact that it is statistically unlikely that the peak bit-rate demands for each of the programmes in a multiplex will occur together as a means to increase the number of multiplex programmes. The statistical multiplexing technique is still in its infancy, however, and its real performance gains remain unclear even for those broadcasters using this technique to deliver digital services today.

The results of the subjective tests showed that the statistical multiplexing method does not reduce the average bit-rate required for the programmes in the multiplex and hence will not allow an increased number of programmes to be transmitted in a multiplexed channel without some loss in programme picture quality. For the same number of multiplexed programmes, however, the statistical multiplexer technique does have the advantage over constant bit-rate encoding that it is able to minimise the instantaneous reductions in programme picture quality for busy programme scenes.

The automated picture assessment of MPEG-2 services

The need for human observers makes the use of subjective assessments impractical for routine monitoring and system testing applications. One of the major objectives for the TAPESTRIES project was to develop an automated video quality system for MPEG-2 services which models both visual and cognitive aspects of the human observer in order to simulate the human response to picture quality assessment. A novel feature of this system is that it is able to track in real-time the quality variations of digitally encoded programmes. The developed system provides results in excellent correlation (>0.9) to those from subjective SSCQE assessments.

The TAPESTRIES model has been submitted to a competitive evaluation organised by the ITU Video Quality Experts Group (VQEG) and is a candidate to become the world-wide standard for the automated measurement of video quality. If the TAPESTRIES model wins this international competition, in accordance with the rules of VQEG, it will be made available on fair and reasonable terms to third parties wishing to commercially exploit the system.

Comparison of automated model and subjective quality results

Comparison of automated model and subjective quality results

A simpler automated system that does not require the use of a reference signal and is able to operate without the need for the reference uncoded video sequence has also been developed in the TAPESTRIES project. This system is well suited to system monitoring applications and has been patented by TAPESTRIES members.

Evaluation of three-dimensional television services

New digital television distribution networks will provide the extra data capacity to support the additional channel required for three-dimensional TV broadcasts. This development coupled with recent advances in three-dimensional display technology make the distribution of three-dimensional television for home reception a real possibility for the future. Three-dimensional television is able to evoke in viewers compelling feelings and emotions of being present at a filmed event. To evaluate the performance of these services TAPESTRIES developed an assessment approach based on measuring viewer presence. To measure presence, the premise that the better a mediated experience approximates the environment the more realistic will be viewers' behavioural responses levels is used. Viewer body sway and heart rate are recorded during the presentation of action stimuli such as the view from a rally car as it travels at high speed around a track.

Extraneous audio-video distractions provide a strong negative cue to presence and act to bring the viewer back to reality. A specially designed isolating experimental environment known as the Platform for Immersive Television (PIT) was developed for presence evaluations. Using this approach it has been demonstrated that viewers experience a much higher level of presence when viewing three rather than two dimensional material. Using subjective ratings of perceived depth and viewer eye-stain the optimum camera filming parameters for stereoscopic services have also been defined.

View from inside the Platform for Immersive Television (PIT)

View from inside the Platform for Immersive Television (PIT)

Impact on MPEG 4 Technology

Image The evaluation of coding scheme performance is fundamental to the development of MPEG standards. Expert evaluations and subjective tests are typically used to determine performance. Since subjective tests are relatively expensive and time-consuming to implement, they are typically performed only at the beginning and end of the standards development process. Tests at the beginning of the process are used to rank order the proponent systems, and at the end to provide a reliable evaluation of the chosen standard.

The TAPESTRIES project provided considerable support to the Test Subgroup activities of MPEG and: proposed new test methodologies for MPEG-4 video functionality assessments, ran its own tests on MPEG-4 video, and until October '98 one of the TAPESTRIES partner's was responsible for the co-ordination of the Test Subgroup activities.

During this period the Test Subgroup performed a number of tests, including the second round of MPEG-4 competition tests and MPEG-4 verification tests.

The second round of MPEG-4 competition tests

In November 1996, MPEG issued a second call for proposals for audio, video, and combined audio-visual coding systems. This test was aimed at evaluating whether the MPEG-4 Verification Model could be improved using state-of-the-art coding algorithms that were able to provide new levels of functionality or able to significantly outperform the existing MPEG-4 Verification Model (VM). Following this call a number of audio and video proposals were submitted, many of these were integrated in the Core Experiment process, and finally two of the video coding proposals were chosen to be tested against the MPEG-4 VM.

TAPESTRIES provided support to these tests by: co-ordinating the entire test process, defining the experimental design, providing test administrators and performing a statistical analysis on the results. Three tests were carried out, corresponding to different ranges of bit rate and criticality of the video material. In each test the video coding efficiency of the two proposals and the MPEG-4 VM was evaluated by using appropriate test procedures. The results of the tests indicated that there was no significant difference between the performance of the VM and the performance of the two video-coding proposals. Based on this conclusion, the Video Subgroup of MPEG decided that neither of the two proposed algorithms would be included in the video VM.

The MPEG-4 verification tests

MPEG carries out verification tests to check whether a developed standard delivers what it promises. These tests, therefore, are a key activity in the final phase of the development of an MPEG standard and results may also be used to provide National Bodies with additional information before a voting on the acceptance of a standard. Verification tests were planned for the MPEG-4 standard taking into account its potential applications.

MPEG-4 audio verification tests were completed in October '98. These addressed the following applications: narrow-band audio broadcasting, speech coding, and audio on the Internet. The formal tests for narrow-band audio broadcasting were carried out in collaboration with the NADIB (Narrowband Digital Audio Broadcasting) Group. These tests explored the performance of speech and music coders operating at bit rates in the range 6 kb/s to 24 kb/s, including scaleable codec options. The results showed that a significant improvement in quality can be offered in relation to conventional analogue AM broadcasting and that scaleable coders offer superior performance over simulcast networks [5].

The verification tests on speech coding evaluated the performance of MPEG-4 speech codecs against available standards over three ranges of bit rates from 2 kbit/s up to 18 kbit/s, including scalable options. The results showed that overall MPEG-4 codecs are competitive with existing standards and at very low bitrate (up to 4 kbit/s) MPEG-4 demonstrated better performance [6].

The verification tests for audio Internet applications explored the performance of speech and music coders operating in the bit rate range 6 kb/s to 24 kb/s, including scaleable codec options, different ranges of bit rates and different codecs. Due to the complexity of the test design it is difficult to summarise the results. The main conclusions of these tests, however, were that AAC audio coding provided significantly better audio quality than MP3 and scaleable AAC performed better than existing standards. More details on these tests can be found in [7].

In October '98 a complete plan for the MPEG-4 video verification tests was defined. It included testing of error robustness, content-based coding, and scalability [8, 9, 10].

New test methods for evaluating MPEG-4 video functionalities

The performance goals of MPEG-4 standard present new challenges in terms of the design of effective subjective test methods. The majority of the MPEG-4 functionalities are new to audio-visual coding, and little prior experience existed for subjectively testing coding performance with respect to these functionalities.

From the point of view of subjective evaluation, the most critical MPEG-4 video functionalities were error robustness and content-based coding. Artefacts due to transmission errors are sparse, highly variant in terms of occurrence, duration and intensity, and at very low bit rates may be masked by compression impairments. Artefacts due to content-based coding may be concentrated on specific areas of the scene (e.g. object contours, texture of particular objects ) and the impact of the impairment of an object depends on the displayed background. For these two functionalities TAPESTRIES and MoMuSys proposed two new testing methods [11, 12], named the Simultaneous Double Stimulus Continuous Evaluation (SDSCE) and object-based evaluation methods. The Simultaneous Double Stimulus for a Continuous Evaluation (SDSCE) is derived from the SSCQE method described in [1]. SSCQE is suitable to evaluate sparse impairments, but since no references are used, it is not suitable to evaluate fidelity nor to distinguish a particular source of artefacts.

An important requirement for the MPEG tests was to evaluate the fidelity of MPEG-4 coded sequences as channel errors may cause whole objects to disappear without producing other appreciable artefacts. A further requirement was to evaluate the annoyance of residual transmission impairments, but not taking into account the annoyance of coding impairments themselves. To meet these requirements it was decided to compare compressed sequences affected by transmission errors against the same compressed sequences without transmission errors. In these tests a panel of subjects watch the two sequences contemporaneously on the same screen, as illustrated in the figure below. The observers were requested to identify the differences between the two sequences and to judge the fidelity of the video information using the slider on a handset-voting device. When the fidelity was perfect, the slider should be moved to the top of the scale range (coded 100), when the fidelity was null, the slider should be moved to the bottom of the scale (coded 0). During these tests the subjects were aware of which picture sequence was the reference and which picture sequence was the one they needed to express an opinion on.

Typical screen presentation during a SDSCE subjective test

Typical screen presentation during a SDSCE subjective test

Experiments were made by TAPESTRIES to confirm the validity of this new test procedure which has now been adopted by MPEG for the evaluation of MPEG-4 systems, and it is expected to be included in ITU recommendations [2,3].

The second modification proposed to existing subjective assessment methods for MPEG-4 applications is related to the evaluation of object-based functionalities. The reason for this modification is twofold. First, there is an interaction between the perceived quality of each object in a scene. Secondly a content-based coded scene can be used and presented as it has been composed by its author or modified to using different combinations of objects from the original scene to provide a new scene.

TAPESTRIES and MoMuSys proposed to evaluate content-based functionalities (object scalability and object-based quality scalability) in two test runs. In the first run the overall quality of the scene is evaluated, and in the second the quality of a single object in the scene is evaluated. In the first run standard ITU methods are used, whilst in the second run a new test method is used to evaluate the efficiency of the object texture and shape coding by displaying it on a grey background. This is illustrated in the figure below. This approach eliminates the interaction between the quality of the object under evaluation and the spectral characteristics of the other objects in the same scene.

A test to validate the proposed modifications was carried out by TAPESTRIES in the framework of WP4 Activity 2 'Evaluation of MPEG-4 applications' and this procedure is expected to be included in ITU-T recommendations.

Testing of content based functionalities

Testing of content based functionalities

Other MPEG-4 related activities

TAPESTRIES worked with the MoMuSys ACTS project on a joint evaluation of the quality of mobile multimedia services using object based applications such as Traveller Information and Remote Security Surveillance. The tests included "end-user field trials" (evaluating the added value of MPEG-4 encoding processes in terms of functionality and quality for these applications) and "internal project tests" to investigate the capability of the system. A more detailed description of this work is provided in [13].

References

Image [1] ITU-Rec. BT.500-8, Methodology for the subjective assessment of the quality of television pictures, 1998

[2] CSELT (Italy), FranceTélécom (France) - Proposed modifications to Recommendation P.910 , ITU-T SG12 Delayed Document 085, November 1998

[3] France, Italy - Draft proposal for modification of recommendation ITU-R BT.500 - A novel method for error robustness evaluation in video communication: the simultaneous double stimulus for a continuous evaluation , ITU-R Doc.10-11Q/30 April 1999

[4] ACO55/CSE/DS/I/010 - ' Experimental results of MPEG-4 competition tests'

[5] ISO/IEC JTC1/SC29/WG11/MPEG98/N2276- Report on the MPEG-4 audio NADIB verification tests , July 1998

[6] ISO/IEC JTC1/SC29/WG11/MPEG98/N2424, Report on the MPEG-4 speech codec verification tests , October 1998

[7] ISO/IEC JTC1/SC29/WG11/MPEG98/N2425 , MPEG-4 Audio verification test results: Audio on Internet , October 1998

[8] ISO/IEC JTC1/SC29/WG11/MPEG98/N2488, Revised Test Conditions for Video Verification Test On Content-Based Coding , October 1998

[9] ISO/IEC JTC1/SC29/WG11/MPEG98/N2489, Revised test conditions for video verification test on scalability , October 1998

[10] ISO/IEC JTC1/SC29/WG11/MPEG98/N2490, Error Resilience Verification Test Plan , October 1998

[11] ACO55/CSE/DS/R/012/b - ' Evaluation of selected applications provided by content-based coding schemes '

[12] AC055/CSE/DS/R/009 - Subjective assessment methodologies for use in MPEG-4 validation test

[13] ACO55/EBU/DR/22 - ' Evaluations of bit-rate reduced services and review of standardisation activities '

Interviews >>