This project aims at developing Machine Translation (MT) and Language Understanding (LU) systems for limited-domain applications, which require text and/or speech input, using Example-Based (EB) techniques. Compared with more traditional Knowledge-Based approaches, much lower development costs are expected.
However, overcoming some of the limitations of this system constitutes a major requirement in order to enable the scaling-up necessary to achieve useful performance in more complex tasks. Hence, additional research is required:
- to develop appropriate extensions or modifications to the presently adopted techniques;
- to explore alternative approaches to the underlying learning problem; and,
- to study adequate techniques for non-expensive data collection in specific applications. This research constitutes the present project.
Important benefits are expected from the results. Machine Translation and multilingual Language Understanding systems will have a major impact on many fields of human activity, where users would naturally interact with both humans and computers in their native language (text or speech). This has particular relevance in the European Community due to its multilingual nature. In addition to these general benefits inherent to speech-input MT and LU systems, the following derive from the approaches chosen in this project:
- the use of EB techniques is expected to dramatically reduce the development costs in many specific domains;
- the tight integration of acoustic-phonetic, lexical, syntactic and translation models leads to more robust speech-input systems, compared with the more conventional, decoupled approach;
- these techniques give the capability of translating input sentences into adequate descriptions of the actions to be driven by the computer, thus allowing a simpler development of LU systems.
In this direction we have recently developed a basic demonstration system. The baseline techniques chosen for this prototype rely on EB techniques for learning a kind of finite-state translation models, known as Sub sequential Transducers. These models lend themselves particularly well to being integrated with acoustic-phonetic, lexical and syntactic models in order to perform speech-input MT. This allows the building of systems in which all the models required for each new application are automatically learned from training data. The same techniques are used for performing LU, which is considered, in limited-domain applications, as a particular case of translation, where the target language is a formal language instead of a natural one. An application considered in this demonstration consists in the description and manipulation of simple visual scenes. Despite the simplicity of this experimental task, the system has clearly shown the interesting possibilities that such kinds of EB techniques open for building low-cost MT and LU systems for many different applications.