This project aims to produce:
guidelines on how to structure new software packages and systems
methods and tools to help adapt existing packages and systems
so that the natural language of interaction can be readily changed. This work is essential if the research advances in natural language processing envisaged in the rest of the LRE programme are going to be exploited.
The project will address the problem by looking at existing examples of conversions of systems (eg Arabic and Greek and French) and at particular case studies, notably a software tool and a self-tutoring system. Through this it will produce the guidelines, methods and tools for software structuring and restructuring, with a view towards broad acceptance and eventual standardisation of the guidelines.
The consortium consists of a research establishment and a university, both of whom have expertise and experience in the area, and associated systems manufacturers who are interested in the conversion of software products for interlinguality.
The approach is novel in that instead of looking at the reuse of linguistic resources within NLP systems, Linguasoft looks at the reuse of other software resources and their integration with linguistic resources. It will draw upon current work on software reuse in ESPRIT (reported on papers at ESPRIT week), such as 1094 Practitioner, 5327 REBOOT, 5311 BUSINESS, and similar, or the Eureka project ESF ROSE, and other work elsewhere. To be able to convert existing packages and systems, it will need technology such as that developed on 2487 REDO. It is intended to reuse existing methods and tools wherever possible.
Changing the language of interaction at first glance may seem trivial, particularly with WIMP interfaces where input is by menu selection and menu headings, prompts, system messages, and similar linguisitic software resources are stored in files. Simply replacing the file changes the language of user interaction. But this only works if natural language characteristics are not deeply embedded in the software (as, for example, matching algorithms for word search in a text editor or record look-up in a database system, or assumptions about average word length). For general approaches to prompts and menus there is a need to capture meaning, and use specialised dictionaries of interaction terms to instantiate the interface for a particular language. String matching algorithms will need to be extended to include current developments in computational linguistics and natural language processing, such as that developed under EUROTRA (Copeland et al 1991). For prompts and messages this will require representation of he meaning in the code (a kind of interlingual key), with this being substituted by the appropriate words and phrases in the natural language concerned. To be able to make these substitutions of words for meaning, translation terminology collections for typical works of interaction, such as in existing technical and computer dictionaries and thesauri will prove to be essential.
LRE is intended to move European software development towards NLP. Apart from free standing systems or system components like Machine Translation or spelling checkers, there is a pressing need to convert complete systems to work in other languages. This project will provide the enabling technology to do this.
The methods and guidelines to be produced are expected to provide input into the standardisation process. The project will provide important input to any standardisation activity that follows on the IAP recommendations for internationalization (ISO 1991). This will be a great benefit to the Community, since all parts of the Community as currently constituted and as may be enlarged within eastern Europe could benefit from European information technology with minimal conversion costs. With the conversion procedures that are aimed to be produced, even software not conforming to any eventual standard will be able to be deployed widely within the Community.
There will also be a consequential benefit to the Community in export markets. The ability to sell systems in the vernacular is important in some markets, such as Arabic for the Middle East, and will become important in others as these markets expand and open up like Russia, India, and even China, as well as the African and American countries, where mostly the extended Roman alphabet is used.
The Interest Group will broaden our perspectives on the requirements for the localisation of software, helping us integrate with other activity in Europe (notably the LISA group), and providing an important channel for dissemination. This Group will be established during June and July 1992. Results will also be disseminated through presentations at conferences and seminars.