Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

Lexical Acquisition Across Languages

Objective

Due to the growing volume of textual information available in multiple languages, there is a great demand for Natural Language Processing (NLP) techniques that can automatically process and manage multi-lingual texts, supporting information access and communication in core areas of society (e.g. healthcare, business, science). Many NLP tasks and applications rely on task-specific lexicons (e.g. dictionaries, word classifications) for optimal performance. Recently, automatic acquisition of lexicons from relevant texts has proved a promising, cost-effective alternative to manual lexicography. It has the potential to considerably enhance the viability and portability of NLP technology both within and across languages. However, this approach has been explored for a very small number of resource-rich languages only, leaving the vast majority of worlds’ languages without useful technology. The ambitious goal of this project is to take research in lexical acquisition to the level where it can support multi-lingual NLP, involving also languages for which no parallel language resources (e.g. corpora, knowledge resources) are available. Building on an emerging line of research which uses mainly naturally occurring supervision (connections between languages) to guide cross-lingual NLP, we will develop a radically novel approach to lexical acquisition. This approach will transfer lexical knowledge from one language to another as well as will learn it simultaneously for a diverse set of languages using new methodology based on guiding joint learning and inference with rich knowledge about cross-lingual connections. We not only aim to create next generation lexical acquisition technology but also aim to take cross-lingual NLP a big step toward to the direction where it is no longer dependent on parallel resources. We will use our approach to support fundamental tasks and applications aimed at broadening the global reach of NLP to areas where it is now critically needed.

Host institution

THE CHANCELLOR MASTERS AND SCHOLARS OF THE UNIVERSITY OF CAMBRIDGE
Net EU contribution
€ 1 989 203,00
Address
TRINITY LANE THE OLD SCHOOLS
CB2 1TN Cambridge
United Kingdom

See on map

Region
East of England East Anglia Cambridgeshire CC
Activity type
Higher or Secondary Education Establishments
Links
Total cost
€ 1 989 203,00

Beneficiaries (1)