Natural language understanding, information extraction, and machine translation are critical for human-machine interaction and key technologies in many everyday applications (e.g. search engines, mobile devices, robots). Natural language understanding systems transform spoken language or written texts into syntactic and semantic structures. Critically, these systems need to be able to work flexibly on many different text genres.
A critical challenge for current syntactic analyzers in real-world applications is to adapt flexibly to different language domains. This problem arises because current syntactic analyzers are trained primarily on syntactically annotated newspaper texts. In particular, the major syntactic resource for training syntactic analyzers in English is an annotated text collection called the Penn-Tree Bank. The Penn-Tree Bank contains texts from only one genre that is economic news. However, the syntactic analyzers are applied to a wide range of text genres such as emails, newsgroups, blogs, consumer reviews, newspapers with mostly non-economic text, spoken language etc. When applied to these texts the error rate doubles. As a result of a doubled error rate the syntactic analyzer assigns the wrong syntactic structures to the input sentences. In other words, it confuses the subject and object in a sentence. Therefore it is no longer able to answer the critical questions in natural language understanding: Who does what to whom and why and when. For real-world applications this means that the robot may fail to understand the instructions or commands posed by the customer.
The aim of this proposal is to reduce this gap and to provide techniques that obtain a higher accuracy and allow the adaptation to out-of domain genres in an easy and economically acceptable way. Further, in an interdisciplinary and innovative fashion, we will combine syntactic analysis with related analysis techniques from the field of speech recognition.
Call for proposal
See other projects for this call