The Analysis and Synthesis of Speaker Characteristics

Objective

The VOX Working Group is an experienced multidisciplinary team of specialists in speech science, speech technology and experimental psychology, each already responsible for internationally recognised innovations of theory and practice in the different areas involved. The overall long-term objective of the Group is to describe inter-speaker differences and intra-speaker differences of speaker type, speaker state and speaker style. Identification of a conceptual framework for the global space of inter-speaker differences will allow the Group to address its key scientific question: what are the dimensions and limits of speaker-space occupiable by the individual speaker, and how can intraspeaker variations of speaker type, speaker state and speaker style be modelled to suit implementation in speech synthesis?
An investigation is taking place of speech databases with different types of speakers, different affective conditions of emotion and attitude, and different casual versus careful styles of speaking: each is considered with reference to acoustic, perceptual and physiological representation. Speech synthesis can be used to empirically test such characterizations.

The first group workshop was held in February 1993, in the Centre for Speech Technology Research at the University of Edinburgh, where a course was taught on Vocal Profile Analysis, and on electropalatographic methods of analysis. This was attended by over 30 researchers from Consortium sites and external researchers invited by the partners.

Members of the working group also attended workshops of other basic research projects and working groups, for liaison and mutual information. These included the SPEECHMAPS project workshop in April 1993 in Paris, and the ACCOR workshop on articulography in Munich in April 1993. Researchers from LIMSI have visited the Stockholm and Sheffield partners, and the Dublin partner has also visited Stockholm for collaborative research. Papers have been given at relevant conferences (the British Institute of Acoustics; the International Association of Forensic Phonetics; International Conference on Interdisciplinary Perspectives in Speech and Language Pathology; Symposium on Natural Language Processing and Speech Technology (Bangkok)).
The activities of the Working Group centre on investigations of the speech of different types of speakers, with different affective conditions of emotion and attitude, and different casual versus careful styles of speech. The Group is considering the three categories of speaker type, speaker state and speaker style. Each of these is considered at three levels: acoustic, perceptual and physiological. Each of these domains also allows consideration of both laryngeal and supralaryngeal contributions to a speaker's voice. Speech synthesis affords an empirical means of testing the conclusions drawn from such investigations. Discussions at group technical meetings and workshops draw results from these three domains together, as a preliminary to the development of an integrated descriptive model of speaker characterisation.

The mode of working of the Group is to hold a Consortium-wide Workshop every six months, where the researchers from each of the sites learns a new analytic technique under the instruction of the host partner. Once a year, a Group plenary meeting discusses progress towards the goal of an integrated descriptive system.

POTENTIAL

Success in providing a unified representation of speaker characteristics would result in industrial usability in terms of production of speech synthesis products that are more naturalistic in quality, and better able to project application-appropriate synthetic speaker-attributes of identity, personality and affect. Speech recognition system capabilities would also be improved through a better understanding of the basis for speaker independence and speaker adaptation. The provision of an adequate description of speaker characterisation would thus bring pervasive benefits to commercially oriented work in speech technology.

Fields of science

Programme(s)

FP3-ESPRIT 3 - Specific research and technological development programme (EEC) in the field of information technologies, 1990-1994

Topic(s)

Data not available

Call for proposal

Data not available

Funding Scheme

Data not available

Coordinator

University of Edinburgh

EU contribution

No data

Address

Old College South Bridge
EH1 1HN Edinburgh
United Kingdom

Total cost

No data

Participants (9)

Centre National de la Recherche Scientifique (CNRS)

France

EU contribution

No data

Address

Université de Provence 29 avenue Robert Schumann
13621 Aix-en-Provence

Total cost

No data

Centre National de la Recherche Scientifique (CNRS)

France

EU contribution

No data

ROYAL INSTITUTE OF TECHNOLOGY

Sweden

EU contribution

No data

Rheinische Friedrich-Wilhelms-Universität Bonn

Germany

EU contribution

No data

UNIVERSITE DE GENEVE

Switzerland

EU contribution

No data

UNIVERSITY OF DUBLIN

Ireland

EU contribution

No data

University of Cambridge

United Kingdom

EU contribution

No data

University of Reading

United Kingdom

EU contribution

No data

University of Sheffield

United Kingdom

EU contribution

No data

Objective

Fields of science

Programme(s)

Topic(s)

Call for proposal

Funding Scheme

Coordinator

Participants (9)

Share this page

Download