Skip to main content

Fast Natural Language Parsing for Large-Scale NLP

Objective

The popularization of information technology and the Internet has resulted in an unprecedented growth in the scale at which individuals and institutions generate, communicate and access information. In this context, the effective leveraging of the vast amounts of available data to discover and address people's needs is a fundamental problem of modern societies.

Since most of this circulating information is in the form of written or spoken human language, natural language processing (NLP) technologies are a key asset for this crucial goal. NLP can be used to break language barriers (machine translation), find required information (search engines, question answering), monitor public opinion (opinion mining), or digest large amounts of unstructured text into more convenient forms (information extraction, summarization), among other applications.

These and other NLP technologies rely on accurate syntactic parsing to extract or analyze the meaning of sentences. Unfortunately, current state-of-the-art parsing algorithms have high computational costs, processing less than a hundred sentences per second on standard hardware. While this is acceptable for working on small sets of documents, it is clearly prohibitive for large-scale processing, and thus constitutes a major roadblock for the widespread application of NLP.

The goal of this project is to eliminate this bottleneck by developing fast parsers that are suitable for web-scale processing. To do so, FASTPARSE will improve the speed of parsers on several fronts: by avoiding redundant calculations through the reuse of intermediate results from previous sentences; by applying a cognitively-inspired model to compress and recode linguistic information; and by exploiting regularities in human language to find patterns that the parsers can take for granted, avoiding their explicit calculation. The joint application of these techniques will result in much faster parsers that can power all kinds of web-scale NLP applications.

Field of science

  • /humanities/languages and literature/general language studies
  • /natural sciences/computer and information sciences/data science/natural language processing
  • /natural sciences/computer and information sciences/artificial intelligence/computational intelligence

Call for proposal

ERC-2016-STG
See other projects for this call

Funding Scheme

ERC-STG - Starting Grant

Host institution

UNIVERSIDADE DA CORUNA
Address
Calle De La Maestranza 9
15001 La Coruna
Spain
Activity type
Higher or Secondary Education Establishments
EU contribution
€ 1 481 747

Beneficiaries (1)

UNIVERSIDADE DA CORUNA
Spain
EU contribution
€ 1 481 747
Address
Calle De La Maestranza 9
15001 La Coruna
Activity type
Higher or Secondary Education Establishments