Audiovisual to Articulatory Speech Inversion

Project description

FET - Open

ASPI is concerned with the recovery of vocal tract shape dynamics from an acoustical speech signal supplemented by image analysis of a speaker’s face. It is (i) developing inversion methods with emphasis to audiovisual to articulatory inversion methods and the investigation of additional constraints and optimization methods to deduce the under-determined nature of inversion, and (ii) constructing a multimodal articulatory database based on ultrasound, MRI, and facial motion capture.

ASPI may lead to a much needed breakthrough in our understanding of speech and our approach to speech research, given the focus on multimodal data collection, the activities related to publicizing data collection protocols and technical specifications of data collection equipment as well the activities planned to exploit the data.

Audiovisual-to-articulatory inversion consists in recovering the vocal tract shape (from vocal folds to lips) dynamics from the acoustical speech signal, supplemented by image analysis of speaker's face. Being able to recover this information automatically would be a major break-through in speech research and technology, as a vocal tract representation of a speech signal would be both beneficial from a theoretical point of view and practically useful in many speech processing applications (language learnin, automatic speech processing, speech coding, speech therapy, film industry...). The design of audiovisual-to-articulatory inversion involves two kinds of interdependent task. The first is the development of inversion methods that successfully answer the main acknowledged difficulties (non-unicity of inverse solution, lack of phonetic relevancy of inverse solutions, impossibility of using standard spectral data), and the second is the construction of an articulatory database that comprises dynamic images of the vocal tract together with the speech signal uttered, and that for several male and female speakers. For the inversion itself the main objectives are: 1.Development of inversion methods, 2.Investigation of additional constraints to reduce the under-determination of the inversion, 3.Evaluation of the inversion methods on articulatory data. For the construction of the articulatory database: 4.Design and acquisition of articulatory data that enables both the development of articulatory models and the assessment of inversion methods, 5.Design of a low cost acquisition technology based on ultrasound and facial motion capture, 6.Exploitation of existing databases (mainly X-ray images previously acquired). The consortium provides an outstanding blend of competences, mixing groups with theoretical background in speech production, acoustic-to-articulatory inversion, computer vision and medical imaging.

Fields of science

Coordinator

CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE

EU contribution

€ 612 112,40

Address

615 rue du jardin botanique
54600 Villers-l�Nancy
France

Total cost

No data

Participants (4)

UNIVERSITE LIBRE DE BRUXELLES

Belgium

EU contribution

€ 265 272,00

Address

AVENUE FRANKLIN ROOSEVELT 50
1050 BRUXELLES

Activity type

Higher or Secondary Education Establishments

Total cost

No data

INSTITUTE OF COMMUNICATION AND COMPUTER SYSTEMS

Greece

EU contribution

€ 300 639,60

Address

PATISSION STREET 42
10682 ATHENS

Activity type

Research Organisations

Total cost

No data

KUNGLIGA TEKNISKA HOEGSKOLAN

Sweden

EU contribution

€ 421 976,00

Address

VALHALLAVAEGEN 79
100 44 STOCKHOLM

Activity type

Higher or Secondary Education Establishments

Total cost

No data

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE

France

EU contribution

€ 0,00

Project description

Fields of science

Programme(s)

Topic(s)

Call for proposal

Funding Scheme

Coordinator

Participants (4)

Share this page

Download