Objectif
The popularization of information technology and the Internet has resulted in an unprecedented growth in the scale at which individuals and institutions generate, communicate and access information. In this context, the effective leveraging of the vast amounts of available data to discover and address people's needs is a fundamental problem of modern societies.
Since most of this circulating information is in the form of written or spoken human language, natural language processing (NLP) technologies are a key asset for this crucial goal. NLP can be used to break language barriers (machine translation), find required information (search engines, question answering), monitor public opinion (opinion mining), or digest large amounts of unstructured text into more convenient forms (information extraction, summarization), among other applications.
These and other NLP technologies rely on accurate syntactic parsing to extract or analyze the meaning of sentences. Unfortunately, current state-of-the-art parsing algorithms have high computational costs, processing less than a hundred sentences per second on standard hardware. While this is acceptable for working on small sets of documents, it is clearly prohibitive for large-scale processing, and thus constitutes a major roadblock for the widespread application of NLP.
The goal of this project is to eliminate this bottleneck by developing fast parsers that are suitable for web-scale processing. To do so, FASTPARSE will improve the speed of parsers on several fronts: by avoiding redundant calculations through the reuse of intermediate results from previous sentences; by applying a cognitively-inspired model to compress and recode linguistic information; and by exploiting regularities in human language to find patterns that the parsers can take for granted, avoiding their explicit calculation. The joint application of these techniques will result in much faster parsers that can power all kinds of web-scale NLP applications.
Champ scientifique (EuroSciVoc)
CORDIS classe les projets avec EuroSciVoc, une taxonomie multilingue des domaines scientifiques, grâce à un processus semi-automatique basé sur des techniques TLN.
CORDIS classe les projets avec EuroSciVoc, une taxonomie multilingue des domaines scientifiques, grâce à un processus semi-automatique basé sur des techniques TLN.
- lettreslangues et littératureétude générale des langues
- sciences naturellesinformatique et science de l'informationinternet
- sciences naturellesinformatique et science de l'informationscience des donnéestraitement du langage naturel
- sciences naturellesinformatique et science de l'informationintelligence artificielleapprentissage automatique
- sciences naturellesinformatique et science de l'informationintelligence artificielleintelligence de calcul
Vous devez vous identifier ou vous inscrire pour utiliser cette fonction
Mots‑clés
Programme(s)
Thème(s)
Régime de financement
ERC-STG - Starting GrantInstitution d’accueil
15001 La Coruna
Espagne