Skip to main content



CAVE will tighten the security of "calling-card" and telephone banking services using speaker verification technology. It will advance the state-of-the-art in this field, delivering the technology on proven industry-standard platforms. The prototype system design will facilitate integration into a range of application environments. It will be scaleable for handling a large number of clients and able to operate with varying speech quality, including mobile phone transmission. The pilot will show how this technology can provide the high-level safeguards against fraud which public telephone network services, including telephone banking, require.
User requirements and market studies, completed during the year, improved insight into many of the issues involved in speaker verification as a means to protect tele-services from unauthorised access. The technology available currently is able to perform reliably under the adverse conditions that prevail in actual operation. In a single month, the project reduced error rates, on a very difficult speech database collected over international telephone lines, by more than 50% (achieving error rates better that those specified for the user requirements). Continued collaboration between the partners in the project is expected to enable performance improvements on real telephone speech amounting by a further 50% reduction by the end of the project. Three successive phases of demonstrations have been designed which include usability experiments to be carried out at each of phase.
The Markets
The telephone is the most widely available tool for all sorts of communication, information gathering, and transaction services. Millions of people take advantage of the convenience of the telephone: every time they use a telephone calling-card to place a long-distance call, or check their account balance with a telephone banking service, or even to arrange valuable business transactions. However, the introduction of these new services has raised new questions: how can the consumer be sure that no other person can gain access to his or her telephone calling card, or to the private details of his or her bank account? At the moment, consumers are unable to feel secure. A high level of telephone calling card fraud is reported every year(in the United States the estimated figure for 1995 was $1 billion, with 1in 6 card users estimated to be a victim of fraud during the year). This hits both the telecommunications companies and the consumer.
The situation in direct banking is somewhat different. As yet, no direct banking-based frauds have come to light. This is because currently-available services are still relatively unattractive to fraudsters. Agent-assisted services, in which the caller speaks to an employee of the bank throughout the call, verify the caller on the basis of a lengthy 'security handshake' in which the caller answers personal questions to which only the genuine caller should know the answer. New business concepts and competition will change this situation dramatically.
The explosive growth in the use of tele-services and chip cards has led to a need to find reliable ways of preventing fraud, and biometric protection has become the subject of considerable interest. Speaker verification is the least obtrusive biometric device. It will enable service providers to develop new and user-friendly tele-business services and improve existing ones by adding improved security and easier user interfaces.
User Requirements and Profile Of Demonstrators
Most service managers are not aware of the state of the art in speech technology, in general, and speech verification, in particular. They also lack a clear understanding of the ways in which the technology can be integrated into service interfaces to the benefit of both the service provider and the consumer. In practice, the role of speech verification and the way it can be integrated into service interfaces depends on the type of service, the actual risks, and the way in which the fraud issue is presented to service customers. It is impossible, therefore, to provide a set of guidelines for the deployment of speech verification, which is universally applicable
The demonstrators that CAVE will develop and test intend to show the ways in which speech verification can be used and to help to educate service managers.
Speech Verification Technology And Evaluation
In CAVE work has concentrated on fixed vocabulary, text dependent and text prompted techniques. In practical terms, the utterances on which customers are verified only consist of sequences of digits. In some cases the digit sequences are fixed, as with card/account numbers and PIN codes. In other cases the sequences can be made 'unpredictable' i.e. the service generates a random sequence of four or five digits, and prompts the customer to repeat it, using the response to verify the caller's identity.
All speech verification techniques investigated in CAVE have been cast in the form of some special case of Hidden Markov Model. This has allowed the establishment of a common experimental platform, based on commercially available software, so that very large scale experiments became feasible by distributing tasks among the partners. This common platform has put CAVE in the position to obtain the best performance scores known in the literature using the internationally used calibration database, YOHO. CAVE has developed scoring procedures and software which is expected to gain general acceptance in the field, for assessment of the quality of speech verification systems.
In November 1996 CAVE started to develop a speech database, comprising recordings collected from all over the world, from various locations ranging from quiet hotel rooms to very noisy airports and train stations. Error rates on that database have been reduced to 2.5 %, a figure well below the 5% threshold for deployment quoted by most service managers. Here too, the common, distributed experiment platform has been essential. The project expects that this platform will make it possible to reduce error rates to approximately 1% by the end of 1997.
Implementation And Integration
If speech verification is to be deployed in actual applications it must either compete effectively or be combined with other technical and operational procedures to improve security levels. Well defined integration scenarios and standardised interfaces (API's) are prerequisites for successful introduction of speech verification. CAVE intends to contribute to other activities in this field. Recently a group in the USA has started work to define API's for speech verification (the Speech Verification API-group).
The Way Ahead
CAVE expects to complete the 2nd and 3rd phase demonstrators for both business sectors addressed by the project (telecommunication and banking) in March and June, respectively.
Implementation and usability tests will be carried out in the Netherlands (telecommunication environment) and in Switzerland (banking applications) The existing reports on user requirements and marketing analysis will be updated in the light of results from these experiments. Development of the technology will continue throughout the year, as will investigations of better procedures to evaluate performance.
CAVE will present and demonstrate the results in a number of international conferences and exhibitions.
Security has long been a major concern in telematics information and transaction services. Large sums of money are lost due to fraud, especially in the latter. Major banks and telephone network operators providing telematics information and transaction services need a high level of protection against fraud.

Speaker Verification (SV) technology has not yet been validated for large scale applications with the general public. A complete understanding of where SV can be successfully deployed and under which conditions is still lacking. CAVE intends to shed more light on these issues by conducting R&TD work, trials and integration tests. Other aspects, such as user acceptance and ergonomics, as well as the identification of relevant legal and privacy issues, will also be addressed.

The CAVE consortium makes use of Text Dependent and Text Independent SV technology that is close-to-market. Text-prompted SV technology will only be addressed in order to define future R&D.

The first demonstrators will be evaluated in a research environment. Later versions will be tested in limited operational environments. Actual deployment in banking and telecommunication services will depend on the outcome of the experiments.

Progress and results

The envisaged results of CAVE are:
- taxonomy of SV technology in real life applications,
- real-time demos of SV in banking and telco applications,
- reports on field-test also including user acceptance and enrolment,
- improved SV technology and verification methodology,
- recommendations for future R&D.

Expected public reports will include:
- user requirements, markets and technology;
- evaluation of SV field trials in banking;
- evaluation of SV field trials in telecommunication.


Results of CAVE will be demonstrated in workshops, tutorials and exhibitions.

CAVE will identify further research that needs to be carried out and bring together a first set of best practice guidelines for the successful deployment of SV technology.

Funding Scheme

CSC - Cost-sharing contracts


PTT Telecom B.V.

2500 GD Den Haag

Participants (6)

6,Place D'alleray
75505 Paris
4,Avenue De Simplon
1920 Martigny
100 44 Stockholm
13680 Haninge
8021 Zurich
Vocalis Ltd
United Kingdom
Chaston House Mill Court Great Shelford
CB2 5LD Cambridge