CORDIS
EU research results

CORDIS

English EN

Analysis of Natural Language for Real World Applications

Objective

Natural language understanding, information extraction, and machine translation are critical for human-machine interaction and key technologies in many everyday applications (e.g. search engines, mobile devices, robots). Natural language understanding systems transform spoken language or written texts into syntactic and semantic structures. Critically, these systems need to be able to work flexibly on many different text genres.
A critical challenge for current syntactic analyzers in real-world applications is to adapt flexibly to different language domains. This problem arises because current syntactic analyzers are trained primarily on syntactically annotated newspaper texts. In particular, the major syntactic resource for training syntactic analyzers in English is an annotated text collection called the Penn-Tree Bank. The Penn-Tree Bank contains texts from only one genre that is economic news. However, the syntactic analyzers are applied to a wide range of text genres such as emails, newsgroups, blogs, consumer reviews, newspapers with mostly non-economic text, spoken language etc. When applied to these texts the error rate doubles. As a result of a doubled error rate the syntactic analyzer assigns the wrong syntactic structures to the input sentences. In other words, it confuses the subject and object in a sentence. Therefore it is no longer able to answer the critical questions in natural language understanding: Who does what to whom and why and when. For real-world applications this means that the robot may fail to understand the instructions or commands posed by the customer.

The aim of this proposal is to reduce this gap and to provide techniques that obtain a higher accuracy and allow the adaptation to out-of domain genres in an easy and economically acceptable way. Further, in an interdisciplinary and innovative fashion, we will combine syntactic analysis with related analysis techniques from the field of speech recognition.
Leaflet | Map data © OpenStreetMap contributors, Credit: EC-GISCO, © EuroGeographics for the administrative boundaries

Coordinator

THE UNIVERSITY OF BIRMINGHAM

Address

Edgbaston
B15 2tt Birmingham

United Kingdom

Activity type

Higher or Secondary Education Establishments

EU Contribution

€ 100 000

Administrative Contact

Xavier Rodde (Mr.)

Project information

Grant agreement ID: 618143

Status

Closed project

  • Start date

    1 September 2013

  • End date

    31 August 2018

Funded under:

FP7-PEOPLE

  • Overall budget:

    € 100 000

  • EU contribution

    € 100 000

Coordinated by:

THE UNIVERSITY OF BIRMINGHAM

United Kingdom