Skip to main content

The Analysis and Synthesis of Speaker Characteristics


The VOX Working Group is an experienced multidisciplinary team of specialists in speech science, speech technology and experimental psychology, each already responsible for internationally recognised innovations of theory and practice in the different areas involved. The overall long-term objective of the Group is to describe inter-speaker differences and intra-speaker differences of speaker type, speaker state and speaker style. Identification of a conceptual framework for the global space of inter-speaker differences will allow the Group to address its key scientific question: what are the dimensions and limits of speaker-space occupiable by the individual speaker, and how can intraspeaker variations of speaker type, speaker state and speaker style be modelled to suit implementation in speech synthesis?
An investigation is taking place of speech databases with different types of speakers, different affective conditions of emotion and attitude, and different casual versus careful styles of speaking: each is considered with reference to acoustic, perceptual and physiological representation. Speech synthesis can be used to empirically test such characterizations.

The first group workshop was held in February 1993, in the Centre for Speech Technology Research at the University of Edinburgh, where a course was taught on Vocal Profile Analysis, and on electropalatographic methods of analysis. This was attended by over 30 researchers from Consortium sites and external researchers invited by the partners.

Members of the working group also attended workshops of other basic research projects and working groups, for liaison and mutual information. These included the SPEECHMAPS project workshop in April 1993 in Paris, and the ACCOR workshop on articulography in Munich in April 1993. Researchers from LIMSI have visited the Stockholm and Sheffield partners, and the Dublin partner has also visited Stockholm for collaborative research. Papers have been given at relevant conferences (the British Institute of Acoustics; the International Association of Forensic Phonetics; International Conference on Interdisciplinary Perspectives in Speech and Language Pathology; Symposium on Natural Language Processing and Speech Technology (Bangkok)).
The activities of the Working Group centre on investigations of the speech of different types of speakers, with different affective conditions of emotion and attitude, and different casual versus careful styles of speech. The Group is considering the three categories of speaker type, speaker state and speaker style. Each of these is considered at three levels: acoustic, perceptual and physiological. Each of these domains also allows consideration of both laryngeal and supralaryngeal contributions to a speaker's voice. Speech synthesis affords an empirical means of testing the conclusions drawn from such investigations. Discussions at group technical meetings and workshops draw results from these three domains together, as a preliminary to the development of an integrated descriptive model of speaker characterisation.

The mode of working of the Group is to hold a Consortium-wide Workshop every six months, where the researchers from each of the sites learns a new analytic technique under the instruction of the host partner. Once a year, a Group plenary meeting discusses progress towards the goal of an integrated descriptive system.


Success in providing a unified representation of speaker characteristics would result in industrial usability in terms of production of speech synthesis products that are more naturalistic in quality, and better able to project application-appropriate synthetic speaker-attributes of identity, personality and affect. Speech recognition system capabilities would also be improved through a better understanding of the basis for speaker independence and speaker adaptation. The provision of an adequate description of speaker characterisation would thus bring pervasive benefits to commercially oriented work in speech technology.


University of Edinburgh
Old College South Bridge
EH1 1HN Edinburgh
United Kingdom

Participants (9)

Centre National de la Recherche Scientifique (CNRS)
Université De Provence 29 Avenue Robert Schumann
13621 Aix-en-provence
Centre National de la Recherche Scientifique (CNRS)

91406 Orsay

100 44 Stockholm
Rheinische Friedrich-Wilhelms-Universität Bonn
Poppelsdorfer Allee 47
53115 Bonn
Rue De Saussure, 6
1004 Geneve
Trinity College
Dublin 2
University of Cambridge
United Kingdom
Free School Lane
CB2 3RF Cambridge
University of Reading
United Kingdom
Earley Gate Whiteknights
RG6 2AR Reading
University of Sheffield
United Kingdom
Western Bank
S10 2TN Sheffield