The GIST project addresses the construction of a multilingual generation system for texts describing bureaucratic procedures (e.g. how to apply for a social security facility, what to do in order to get visas, etc.) starting from language independent specifications. Three languages will be taken into consideration: English, German and Italian. The final prototype is expected to provide good quality drafts of texts; such drafts are then to be revised and post-edited by professional writers and/or translators.
The system is meant to significantly shorten and improve the process of production of instructional texts. Also, the possibility of storing and reusing previously edited, language independent descriptions will improve the effectiveness of the GIST system. The prototype will constitute the starting point for a generation of systems for supporting the work on procedures in Public Administrations and in large companies. The idea is to embody the drafting functionality in an integrated system with several other functionalities for storing, retrieving and analysing structured representations of procedures.
Approach and Methodology
A language independent specification will be provided by a user by means of a graphical interface, built on top of the knowledge representation system where the relevant domain knowledge is stored. The interface will feature facilities for assessing the consistency of the description, and for its storage and retrieval. The (language independent) description serves as input for a module that - for each language - produces a text plan which respects language- and domain-specific requirements. The text plan is translated into the format of a Sentence Plan Language, which serves as input specification for the tactical generators
The project is largely based on the re-use of existing technologies, mainly developed by the partners in the framework of national and international research programmes. Other than this, the development will be based on extensive empirical research. Empirical investigations will address the occurrence and interdependence of discourse phenomena in the selected corpus of texts. They will also deal with the determination of needs of the text producers (domain experts, technical writers and translators) in their everyday work. These examinations are made with respect to the three languages: English, German and Italian. The results of the empirical investigations will be stated in formal terms and will be represented in knowledge sources. Another issue dealt with in the project is the identification of textual phenomena which are common to various languages, allowing a common representation. Resource sharing is emphasized and redundancy avoided.
For the development of the final drafter prototype, the project is divided into the phases of requirements analysis, adaptation of tools, theoretical specifications and practical implementation of the drafter components, integration and evaluation. Users will be participating in various stages of the project by defining assessment criteria and evaluating prototypes developed throughout the project.
Exploitation and Future Prospects
The development of the GIST system advances the state-of-the-art with respect to the following points:
multilinguality: different text-linguistic means can be used to express the same message in various language; these findings are encoded in the design of the GIST drafter, which produces texts which account for language-specific differences.
interrelationships between various textual phenomena: the project examines and formalizes the interactions between text structure and the textual phenomena of anaphoric reference and thematic progression. They are represented in the system by means of language-dependent and language-independent components.
user-participation: the potential users are involved starting from the first phases of the prototype development. The needs of technical writers and translators are therefore encoded in the system.
reusable knowledge sources: the knowledge sources developed in the course of the project will be represented in a modular and declarative way; this method facilitates the extension and adaptation of the prototype to new applications and new domains and also makes the components reusable for other NLP systems, like text understanding, machine translation or automatic abstracting systems.
relevant industrial application of prototype: the drafter prototype can be directly applied and easily extended to the generation of multilingual procedural texts as produced in Public Administration and large companies daily.
The use of the GIST system by the selected user groups (INPS, PAB) will have an important role in showing the benefits of applied Natural Language Processing to other users, especially Public Administrations in multilingual areas. The industrial partner (Quinary) will use the prototype to develop a marketable product for the drafting of multilingual texts, and promote it in areas concerned with multilingual documentation; new application areas and functionalities will be explored, as well as the integration of the prototype - or its subsequent developments - into more complex systems. The partners with a background in applied research (IRST, ITRI, OFAI) will be responsible for promoting, distributing and licensing the GIST system as a research prototype in the scientific community.
By means of further technological development, different classes of systems for dealing with more complex procedural prescriptions can be derived from the GIST results. The drafting of text integrated with pictures - allowing the automatic drafting of user manuals - and the integration of the functional description of components and devices with procedural descriptions - allowing the automatic drafting of manuals out of the formal model of the focused device - are among the envisaged GIST extensions.