Objective
There are noticeable asymmetries in availability of high-quality natural language processing (NLP). We can adequately summarize English newspapers and translate them into Korean, but we cannot translate Korean newspaper articles into English, and summarizing micro-blogs is much more difficult than summarizing newspaper articles. This is a fundamental problem for modern societies, their development and democracy, as well as perhaps the most important research problem in NLP right now.
Most NLP technologies rely on highly accurate syntactic parsing. Reliable parsing models can be induced from large collections of manually annotated data, but such collections are typically limited to sampled newswire in major languages. Highly accurate parsing is therefore not available for other languages and other domains.
The NLP community is well aware of this problem, but unsupervised techniques that do not rely on manually annotated data cannot be used for real-world applications, where highly accurate parsing is needed, and sample bias correction methods that automatically correct the bias in newswire when parsing, say, micro-blogs, do not yet lead to robust improvements across the board.
The objective of this project is to develop new learning methods for parsing natural language for which no unbiased labeled data exists. In order to do so, we need to fundamentally rethink the unsupervised parsing problem, including how we evaluate unsupervised parsers, but we also need to supplement unsupervised learning techniques with robust methods for automatically correcting sample selection biases in related data. Such methods will be applicable to both cross-domain and cross-language syntactic parsing and will pave the way toward robust and scalable NLP. The societal impact of robust and scalable NLP is unforeseeable and comparable to how efficient information retrieval techniques have revolutionized modern societies.
Fields of science (EuroSciVoc)
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: https://op.europa.eu/en/web/eu-vocabularies/euroscivoc.
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: https://op.europa.eu/en/web/eu-vocabularies/euroscivoc.
- natural sciences computer and information sciences data science natural language processing
- natural sciences computer and information sciences artificial intelligence machine learning unsupervised learning
- social sciences political sciences government systems democracy
You need to log in or register to use this function
We are sorry... an unexpected error occurred during execution.
You need to be authenticated. Your session might have expired.
Thank you for your feedback. You will soon receive an email to confirm the submission. If you have selected to be notified about the reporting status, you will also be contacted when the reporting status will change.
Programme(s)
Multi-annual funding programmes that define the EU’s priorities for research and innovation.
Multi-annual funding programmes that define the EU’s priorities for research and innovation.
Topic(s)
Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.
Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.
Call for proposal
Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.
Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.
ERC-2012-StG_20111124
See other projects for this call
Funding Scheme
Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.
Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.
Host institution
1165 Kobenhavn
Denmark
The total costs incurred by this organisation to participate in the project, including direct and indirect costs. This amount is a subset of the overall project budget.