Objetivo
EAGLES will build on its previous work on identifying best practice and defining guidelines in language engineering. It will both consolidate existing work and carry forward new work in key areas dealing with written and spoken language resources and the evaluation of NLP systems. In addition to guidelines definition, emphasis will be placed on field testing, wider dissemination in customised forms, extended industry involvement, and revision and maintenance issues. Close interaction with EU RTD and industry projects will be sought to enhance user feedback and acceptance. The results, based on proven methodologies and organisational structures, will be a set of handbooks to be disseminated widely through the Web. They are expected to facilitate the integration and reuse of language engineering components and resources. This, in turn, will lower development costs and give system builders the opportunity to construct more robust, flexible integrated applications in a shorter time.
EAGLES is developing a set of guidelines and de facto standards for the language engineering field. It will add to and build on the standardisation work already carried out in the previous (LRE) project of the same name.
In the past, progress in producing Natural Language (NL) applications has been hampered by the high cost of developing language resources, since it has seldom been possible to build on the results of past work. Users have also often found that they cannot easily integrate these applications, even different ones from the same company. Evaluation of products is almost impossible due to the lack of standards against which to compare them. Coupled with this is the increasing demand for NL services, especially speech based products with their costly, high quality corpora. EAGLES will ameliorate this situation by providing standards so that language engineering components can be reused, reducing system costs and speeding up their time to market.
Market Situation
The language engineering market offers many applications, often developed by SMEs, often supplying lower level products such as spell checkers and dictionary based packages. The effort required to build these and more sophisticated applications is great, and their reusability is by no means guaranteed. Users find that they cannot easily integrate language engineering applications, especially since they are almost all stand-alone, even different ones from the same company. Evaluation of products is also almost impossible due to the lack of standards against which to compare them.
Pioneering work to produce language engineering resources which are compatible across language boundaries has been done by such projects as SAM (Multi-lingual Speech Input/Output: Assessment, Methodology and Standardisation, Esprit 289). These have been exploited by resource production projects as SPEECHDAT, which will lay the basis for commercial activities in the spoken language field until the next decade. This success has been due to effective collaboration with the previous LRE-EAGLES project, where standards related work in the EU has been largely concentrated since its inception.
Objectives
The main goals of EAGLES are to accelerate the provision of standards for:
I) Very large scale language resources such as text corpora, computational lexicons and speech corpora,
II) The means of manipulating such knowledge via computational linguistic formalisms, mark up languages and various software tools,
III) The means of assessing and evaluating resources, tools and products.
Past EAGLES work was based on five key areas, each with an expert working group developing guidelines.
I) Text Corpora
II) Computational Lexicons
III) Grammar Formalisms
IV) Evaluation
V) Spoken Language
The EAGLES guidelines currently cover aspects of text corpora, computational lexicons, evaluation of natural language processing systems, computational linguistic formalisms and spoken language systems. These are available online from the project web site, where also the new project's results will be made available.
The EAGLES project, furthering the LRE-EAGLES objectives, will extend the thematic coverage of work to encompass: lexicon encoding, including semantic encoding and labelling, encoding of multi-word expressions, and bi- and multi-lingual aspects; and the integration of spoken and written language resources.
Further work will also be carried out to broaden and deepened the existing guidelines, taking into account feedback from guideline users and advancing areas of language engineering which can now move onto standardisation. The existing handbooks will also be upgraded to provide more flexible access, both in terms of less technical overviews and hypermedia presentation on the web. A range of dissemination and awareness activities will ensure that both past and new results are better utilised by a wider audience.
The work of EAGLES overall should be seen as a long term standardisation initiative, where basic specifications have now been completed, which will be disseminated and maintained using the same methodology whilst at the same time the standards are extended according to feedback and new advances in the field.
Results
The results will take the form of a set of handbooks providing guidelines on good practice and de facto standards. Once these are widely accepted and implemented, language engineering for the end user will become second nature rather than the present obstacle course.
This will be due to the sharing of expensive resources, reuse of components, and the rapid construction of integrated, robust, multi-lingual language processing environments.
As these standards become stable and mature, the documentation will be customised towards different groups of readers. Currently the documentation assumes the reader is technically minded, but other types of readers will be decision makers, research scientists and application providers.
Ámbito científico
Tema(s)
Convocatoria de propuestas
Data not availableRégimen de financiación
Data not availableCoordinador
56127 PISA
Italia