This site has been archived on

Esprit Project 22760 - VeriVox
Open LTR - 1st phase
Voice Variability in Speaker Verification

Keywords: Automatic Speaker Verification (ASV), within-speaker variation, structured training

Project home page:


The main aim of VeriVox is to improve the reliability of automatic speaker verification (ASV), by developing novel, phonetically-informed methods for coping with the variation in a speaker's voice. Widespread deployment of ASV is hindered by unacceptable performance in real applications - in particular by 'false rejection' rates for genuine claims which are too high to be tolerated by customer-oriented end-users such as banks. Variation in the way a person speaks contributes significantly to this problem. Advances in signal processing and statistical methods are likely to produce gradual improvements, but VeriVox will exploit the phonetically-structured nature of much within-speaker variation. Sub-aims include the development of methods for eliciting such variation in a controlled way, and the analysis of its acoustic consequences in order to provide a better understanding of the structure of the variation.

The main approach used will be "Structured Training". This will be a procedure for obtaining training data from each new speaker in a way structured to elicit different manners of speaking, so that the system becomes familiar with the variation in that person's voice likely to be encountered. Instead of merely repeating the 'passwords' (which may be short phrases), the speaker will be asked, and where necessary induced, to vary in the way he or she speaks during training. Variation will include loudness and rate, low level psychological stress, and (de-)nasalisation. Acoustic analysis will determine the success of elicitation of rate and loudness variation. As well as this 'Structured Training', training data will also be collected in the usual way to act as a control. The ASV system will learn two alternative models for each speaker, one from the structured training data and one from the training data. Speakers will make identity claims under simulated real-life conditions which will induce variation, and the performance of the two models will be compared. A population of 50 speakers of Swedish will be used, and these will make both 'genuine' and 'imposter' identity claims.

The "Structured Training" strategy is predicted to bring a 25% reduction in false rejection errors without an increase in the false acceptance rate. That is, if we suppose a 'baseline' performance of the system, using the normally trained model, of 5% false rejections and 5% false acceptances, then a reduction in false rejections to 3.75% (or lower) will be achieved without false acceptances rising from 5%. Other deliverables will be a database of utterances produced with known types of speaking variation, and acoustic data on those variations.

The work described here for the first phase of VeriVox will form the basis for a thorough development of strategies for reducing the problem of speaker variation in the second (main) phase, in which English, German, and French will be added, 'structured training' further refined, and an additional strategy of 'guided elicitation' introduced, as described in the project proposal.

Contact Point
Dr Inger Karlsson
Kungliga Tekniska Högskolan
Depart. of Speech, Music and Hearing
Box 70014
S-100 44 Stockholm

e-mail: (E-mail removed)

Start date: 1 April 97
Duration: 6 months - COMPLETED

4th Framework Programme - Esprit Homepage
Welcome to the new
Information Society Technologies Programme (IST)
Our new activities in the IST Programme:
Future & Emerging Technologies

This document is located at /esprit/src/22760.htm
It was last updated on 1 July 1999, and is maintained by (E-mail removed)