Video Assisted with Audio Coding and Representation

Exploitable results

The project approaches the problem of videophone coding from an audio and video joint point of view, both for the analysis and for the synthesis. The motivating idea is that interpersonal audio-video communication represents an easily modelled information source, characterized in audio by a human speaker's voice and in video by the same speaker's face. Two demonstrators are being developed: a hardware platform with H324 coder/decoder integrated with a board for speech analysis and articulation estimation, lip extraction and tracking, audio assisted frame interpolation for increasing the frame frequency; a software demonstrator of a hybrid coding scheme compliant with Moving Picture Expert Group (MPEG)-4 where speech analysis is used for applying suitable deformations onto the modelled component of the scene Results so far include: a synchronized audio-video corpus of 10 speakers, composed of single utterances of 700 English words, has been acquired and processed to allow bimodal multi-speaker speech processing; a set of tools has been developed for extracting the region of the speaker's mouth from QCIF H.324 images and for generating extrapolated frames in which the mouth movements are synthesized based on parameters extracted from speech analysis; a real-time H.324 board, based on Trimedia component, has been partially integrated (video channel only); a 3-dimensional head model and a corresponding set of facial animation parameters (FAP) has been defined and supplied to the MPEG-4 Synthetic/Natural Hybrid Coding verification model.

Exploitable results

Share this page

Download