Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Information Extraction for Everyone

Project description

Computers learn about human’s natural language

Computers are pretty smart but they have their limitations, particularly when it comes to natural language processing (NLP). Language processing involves high-level and abstract rules to convey information, making it difficult for computers to decipher and understand human languages. The EU-funded iEXTRACT project will review the rule-based information extraction methods in view of advances in NLP and machine learning. Information extraction is a collaborative human-computer effort, in which the user provides domain-specific knowledge, and the system solves various domain-independent linguistic complexities, ultimately allowing the user to query unstructured texts. The main goal is to assist domain experts such as lawyers and scientists by empowering them to process large volumes of data and advance their profession.

Objective

Staggering amounts of information are stored in natural language documents, rendering them unavailable to data-science techniques. Information Extraction (IE), a subfield of Natural Language Processing (NLP), aims to automate the extraction of structured information from text, yielding datasets that can be queried, analyzed and combined to provide new insights and drive research forward.

Despite tremendous progress in NLP, IE systems remain mostly inaccessible to non-NLP-experts who can greatly benefit from them. This stems from the current methods for creating IE systems: the dominant machine-learning (ML) approach requires technical expertise and large amounts of annotated data, and does not provide the user control over the extraction process. The previously dominant rule-based approach unrealistically requires the user to anticipate and deal with the nuances of natural language.

I aim to remedy this situation by revisiting rule-based IE in light of advances in NLP and ML. The key idea is to cast IE as a collaborative human-computer effort, in which the user provides domain-specific knowledge, and the system is in charge of solving various domain-independent linguistic complexities, ultimately allowing the user to query
unstructured texts via easily structured forms.

More specifically, I aim develop:
(a) a novel structured representation that abstracts much of the complexity of natural language;
(b) algorithms that derive these representations from texts;
(c) an accessible rule language to query this representation;
(d) AI components that infer the user extraction intents, and based on them promote relevant examples and highlight extraction cases that require special attention.

The ultimate goal of this project is to democratize NLP and bring advanced IE capabilities directly to the hands of
domain-experts: doctors, lawyers, researchers and scientists, empowering them to process large volumes of data and
advance their profession.

Keywords

Project’s keywords as indicated by the project coordinator. Not to be confused with the EuroSciVoc taxonomy (Fields of science)

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

ERC-STG - Starting Grant

See all projects funded under this funding scheme

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

(opens in new window) ERC-2018-STG

See all projects funded under this call

Host institution

BAR ILAN UNIVERSITY
Net EU contribution

Net EU financial contribution. The sum of money that the participant receives, deducted by the EU contribution to its linked third party. It considers the distribution of the EU financial contribution between direct beneficiaries of the project and other types of participants, like third-party participants.

€ 1 499 354,00
Address
BAR ILAN UNIVERSITY CAMPUS
52900 Ramat Gan
Israel

See on map

Activity type
Higher or Secondary Education Establishments
Links
Total cost

The total costs incurred by this organisation to participate in the project, including direct and indirect costs. This amount is a subset of the overall project budget.

€ 1 499 354,00

Beneficiaries (1)

My booklet 0 0