Skip to main content

SPEECH RECOGNITION FOR DATA-ENTRY APPLICATIONS

Objective

SPEEDATA is developing a system for entry to databases of structured and textual land-register data, with continuous speech as the main input medium. The tool will have general application wherever non-electronic information has to be interpreted and computer-stored, as in medical reporting, diagnostics and cataloguing. The user interface will offer a system able to adapt to speaker variation and dialect and operate with different languages. Although sound data entry will be possible without specialist intervention, users need not be concerned with the procedure or with organisation of data, and will be supported in its correction and verification.
Progress
Activities so far on the project have covered:
- establishment of a user group and of the user requirements
- market analysis of existing speech products and the market potential of SpeeData
- preparation of a high-level design specification of the demonstrator, together with guidelines for the development phase
- a preliminary development phase both for speech recogniser and user interface components.
Software tools have been acquired for the development of the user interface. Preliminary data collection has been performed, involving Italian, German and bilingual speakers, and experiments on multi-lingual speech recognition have started.
Development of language engineering components follows two paths: on the one hand, the speech recognition module is being adapted to the data-entry architecture, on the other, the central manager of the application is being developed. A first version of acoustic units of both languages has been trained and tested on part of the data recordedand this has been integrated into the first prototype which was completed at the end of December 1996.
User requirements
The user requirements analysis focused on the Land Register domain, refining a previous analysis carried out by Informatica Trentina for the development of a keyboard-based data-entry system. The main feature of the traditional data entry is that it requires a two step process: an interpretation of the data stored in the existing 'Master Books' and their conversion into data to be recorded in the database. Such a two step process introduces the need for two people to record the data: a land register expert who interprets and dictates the information written in the books and a typist who performs the data-entry.
The possibility of entering data by voice in a natural way, i.e. by following the original organisation of data in the books and speaking the data, should permit a single person, the land registry expert, to carry out the task.
The data-entry requirements of the SpeeData demonstrator are that:
- the standard modes of interaction (keyboard and mouse), as well as speech, should be available in order to optimise response times and user comfort
- during dictation, data should be entered automatically into the correct fields minimising the need to navigate by speech commands
- recognition accuracy should be good enough to maintain efficiency and user satisfaction
- the working environment should be a quiet office environment.
The main requirements for multi-linguality are that:
- the user will specify language at the beginning of each session;
- the recogniser will, in general, understand one language at a time, so that, in cases which require dictation of texts in the other language, the users need to switch to another language
- for all data fields other than free texts the system will provide automatic translation.
Market analysis
The market study focused on the Italian and German speech recognition market. Gartner Group's advanced technology survey was used to identify the most important recognition systems.
An increase from 254 million (1992) to 1.2 billion US$ (1997) is expected for the European market. For Italian six major products were identified, for the German language eleven companies offer products in the field of speech. None of the products, however, is well suited to the type of task covered by SpeeData.
The quality of the voice technology developed for SpeeData compared to the technologies available on the market should not be critical. The overall, multi-modal nature of the user interface will play the important role in positioning SpeeData in the market.
The SpeeData project has identified an important real market potential in the area of data-entry in public administration. The potential size of this market can be directly related to the need for improved efficiency in all office automation and system procedures throughout public administration.
Feasibility and cost/benefit appraisal
Feasibility of the system is assured for each functional aspect and no restriction is imposed other than a quiet, normal office environment and a high-end PC as the minimum hardware requirement.
A cost/benefit evaluation of the SpeeData technology against. conventional data-entry has been carried out for the specific Land Registry application in Trentino: from the comparison of the conventional data-entry with the SpeeData data-entry, the latter achieves a cost saving of some 4.5 MECU for labour. Another positive aspect is that there are fewer people to be selected, engaged and trained.
There are also psychological benefits due to the fact that there is no need for the Land Register experts to become data-entry clerks.
The User Group, Promotion and Awareness
Some organisations, with different users, domains, and problems, have been contacted by the consortium and separated into two different groups:
- the first is specifically interested in the land register domain and is composed of: RATAA and BMJ (public administration), the two sponsor partners of the SpeeData c onsortium, and Regione Autonoma Friuli-Venezia Giulia,
- the second group consists of organisations which are interested in the technology used in this project in order to apply it in other domains. These Italian and German organisations operate in different domains, including culture, health, banking, industry, etc.
The Way Ahead
Future development includes two independent steps:
- development of the user interface, which involves usability engineering issues
- development of the database interface, which will focus on the database structure and functional constraints.
Major activities scheduled are:
- completion and refinement of the prototype
- involvement of the user in evaluation
- improvement of the user interface set-up.
The needs addressed by the project arise in many data-entry situations: information must be extracted from some document or object, through an intelligent interpretation process, and easily entered into a database.

The computer will work like a secretary, typing what the user dictates, correctly moving through the data-entry forms; the user will not worry about how data is entered and organised in the database, having however the tools to verify, correct and store them through a user friendly interface.

The user involved in the project is the Land Register of the Italian bilingual region Trentino-Alto Adige/Südtirol. The Land Register is an institution that since 1800 has been accumulating information about real-estates. Data may be entered in two languages (German and Italian) and may consist of long texts, numbers, proper names, tables, etc. Other market sectors that could benefit from the project's results include all organisations that perform huge data-entry, like libraries, hospitals, museums, etc.

Progress and results

Intermediate results aimed at by the project are the development of tools for the fast set-up of the speech recognition module, the data-entry interface, and the data-base interface, starting from an existing database. Further, techniques will be addressed for making the recognizer more robust with respect to dialect variations and changes of the accepted language. The final result will be a demonstrator running on commercial platforms that integrates all of the above components.

Exploitation

Demonstration of the project will be carried out at two offices (one of them bilingual) of the Land Register; several users will carry out data-entry work (36 user person-month over a 6 month period). Project results will be also demonstrated to the user interest group, that will include, in addition to the sponsoring partners, the Land Register office of the region Friuli Venezia Giulia, some hospital administrations, and the most important Italian cataloguing office.

If the demonstrator confirms expected results, a significant reduction of labour cost (by about 40%) will result in the target application area. Savings of the same order can be expected in other application domains (medical reporting, library and museum cataloguing, diagnostics, ...)

Funding Scheme

CSC - Cost-sharing contracts

Coordinator

Informatica Trentina SpA
Address
Via Giuseppe Gilli 2
38100 Trento
Italy

Participants (2)

BAYERISCHES FORSCHUNGSZENTRUM FUER WISSENSBASIERTE SYSTEME
Germany
Address
7,Am Weichselgarten
91058 Erlangen
ISTITUTO TRENTINO DI CULTURA
Italy
Address
Via Santa Croce 77
38100 Trento