DIAGNOSTIC AND EVALUATION TOOLS FOR NATURAL LANGUAGE APPLICATIONS

Objective

DIET addresses requirements for the assessment of natural language processing (NLP) components in adequacy evaluation and quality assurance. Effective and efficient assessment is often hampered by the lack of suitable test material and technology. DIET will develop methods and tools for the glass box evaluation of NLP components, building on the results of previous projects covering different aspects of assessment and evaluation. It will extend and develop test suites with annotated test items for grammar, morphology and discourse, for English, French and German. DIET will provide user support in database technology, test suite construction tools and graphical interfaces. The project results will be used by the industrial partners for in-house and external quality assurance and evaluation. They will also be made available in the public domain through appropriate channels.

Effective and efficient assessment of Natural Language (NL) processing components is often severely hampered by the lack of suitable test material and technology, which are expensive to develop both in time and cost. A large variety of evaluation tools has been produced but these are mainly specialised toward testing specific pieces of software. Developing reusable components is normally outside the interests of individual companies. However, the emerging language technology industry needs these tools, both for industrial developers of NL products, to monitor quality, and for end users so that they can evaluate the suitability of different products.
DiET will address this by developing methods and tools for 'glass box' evaluation of NL components. These tools will be reusable and customisable, together with reference data organised in test suites annotated with items for grammar, morphology and discourse in the English, French and German languages. They will also provide support for standard databases, test suite construction covering specific domains, and graphical interfaces.
Market Situation
Practically every organisation involved in the development or use of NL products has produced its own specialised ad hoc test material and procedures. There are a few which are generalised such as the monolingual test suites developed by Hewlett Packard (1230 English sentences), the Alvey test suite (1500 English sentences) and the Systran test suite (853 German test items with French translations). However none of these come with highly structured annotations or elaborate database technology, but are mostly organised as flat ASCII files, and since the majority of the diagnostic tools and reference data has sprung from research sponsored in the US, American English is mainly used.
Despite this, the foundations for a generalised tool set has already been laid out in the EU projects TSNLP, FraCaS and TEMAA as well as the EAGLES standardisation initiative. The TSNLP project especially, has shown that comprehensive, ready made test data is need by industry, as well as tools for customisation to specific domains and applications. In addition, syntactic and semantic annotation schemes, namely Penn Tree Bank, ParsEval and SemEval, have been developed in the US but have not been extended to any language beyond American English.
Objectives
The main goal is to develop the methods and tools for the glass box evaluation of NL components. This includes the following:
- checking the performance of an NL system against well defined linguistic phenomena identified in real corpora, such as maintenance manuals.
- measurement of the evolution of a system under development through different releases to identify improvements and detect possible degradation.
- enhancement of resources developed in previous projects such as their annotation schema and test suites.
- introduction of techniques which utilise morphology, semantics and discourse as well as tools for domain and corpus based customisation,
- testing the syntactic and semantic competence of an NL system. This would include using test data which shows not only a single but a multiplicity of phenomena.
Technology Base
The technology which is used will involve the construction, annotation and application of systematic NL test suites. These use database and evaluation technologies, as well as statistical and corpus annotation methods, and will be heavily based on the results of the EU funded projects TSNLP, FraCaS, TEMAA, EAGLES, and of the commercially funded project SLT (Spoken Language Translator).
DiET will then extend the state of the art by using:
- several levels of linguistic analysis, including some dialogue and discourse phenomena.
- application specific performance data such as the frequency and relevance of patterns and phenomena in specific domains and applications.
The project will be broken down into:
Data construction: the existing core test suites for English, French and German developed in TSNLP will be extended along several dimensions:
Syntactic construction: existing test suites built in TSNLP will have their gaps filled and their coverage deepened,
Morphological construction: inflectional morphology for the three languages involved will be covered by instances of morpho-syntactic equivalence classes as training material.
Discourse construction: Semantic and especially contextual phenomena will be dealt with to an acceptable level of accuracy.
Database tools: both commercially and freely available tool kits for graphical interfaces and SQL compliant database servers will be surveyed,
Customisation tools: these will cover corpus related lexical replacement and frequency mapping of test items to corpora.
Results
DiET will provide a system consisting of a core of comprehensive diagnostic data together with suitable tools for the testing of NL products. They will be:
- affordable for a broad variety of users.
- augmentable to integrate existing resources.
- adaptable to the specific requirements of individual users.
- widely acceptable as a pre-standard benchmark.
Demonstration
Three of the project partners will assess the DiET tools at their own sites in a series of continuous, iterations. The applications against which they will be tested will be taken from the following set:
- tools and components for machine translation.
- controlled language and grammar checkers.
- translation memory based computer-aided translation systems.
Particular attention will be placed on the following functionality: extraction of subsets of test items from general purpose test data, construction and integration of new test data, customisation of test data for specific applications or corpora and lexical tailoring of the vocabulary used.
DiET will be used to produce high quality, multi-lingual technical documentation by one of the partners, since it will provide a common reference platform to evaluate NL systems. Another partner will use it for their own activities in servicing the localisation industry.
Benefits and Users
There are three different types of users: ones who wish to assess a commercial NL component or product, those who need measurement methods for quality assurance in industrial development, and finally professional users who test NL applications on behalf of other companies or user groups.
End users will profit from better and less expensive NL products which have been evaluated and verified against widely recognised quality standards. Not only large, but also small and medium enterprises will benefit as DiET will make its results widely available. This will enable NL products to reach market faster with a higher quality.
Those parts of the package whose distribution is not restricted will be distributed through the European Linguistic Resource Agency (ELRA) to be made available at a nominal fee.

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

FP4-TELEMATICS 2C - Specific programme of research and technological development and demonstration in the area of telematic applications of common interest, 1994-1998

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

D.12 - Language Engineering

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Data not available

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Data not available

Coordinator

DFKI GmbH

EU contribution

No data

Address

Stuhlsatzenhausweg 3
66123 Saarbruecken
Germany

Total cost

No data

Participants (3)

Aerospatiale

France

EU contribution

No data

Address

Suresnes

Total cost

No data

IBM

Germany

EU contribution

No data

Address

Heidelberg

Total cost

No data

Localisation Resources Centre

Ireland

EU contribution

No data

Address

Dublin

Total cost

No data

Objective

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Coordinator

Participants (3)

Share this page Share this page on social networks

Download Download the content of the page

DIAGNOSTIC AND EVALUATION TOOLS FOR NATURAL LANGUAGE APPLICATIONS

Objective

Fields of science (EuroSciVoc) CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s) Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s) Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Coordinator

Participants (3)

Share this page Share this page on social networks

Download Download the content of the page

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.