Skip to main content

Analyzing and Recognizing Time, Factuality, and Opinion in
Text

Final Report Summary - ARTIFACTO (Analyzing and Recognizing Time, Factuality, and Opinion in <br/>Text)

ARTiFactO rationale. The overall goal of this research is two-fold: on the one hand, exploring methodologies on porting language processing technology from one language to another, while on the other, contributing specific technological infrastructure to the languages targeted here: Catalan (CA) and Spanish (SP). The project focuses on technology for the processing of three semantic domains: time, event factuality, and opinion information in natural language text. The choice of these three areas is motivated, firstly, by methodological reasons: conceptual systems such as time, factuality, and opinion are expressed by means of well delimited fragments of the general grammar and lexicon of any given language. At the same time, the properties and basic relations among elements in each of these systems (e.g. temporal relations of ordering: before, after, simultaneous; factuality degrees: possible, probable, certain; perspective attitudes: be in favor, against, etc.) are shared across languages. These factors make these systems perfect test-beds for exploring the cross-linguistic porting of Natural Language Processing (NLP) technology. Secondly, the choice is also motivated given that most of the research and technology developed on these areas had been carried out mainly for English. There is therefore a need for technological resources of this type in other languages as well.
Project objectives. The objectives in the present project are established along two different dimensions: Methodological. Furthering the knowledge on methods and techniques for cross-lingual porting of NLP technology. Contemplating the following tasks:
Task 1.a Exploring techniques for porting NLP technology considering language pairs of different degrees of similarity (EN – SP, SP – CA)
Task 1.b Exploring alternatives to standard NLP components in order to cover technological gaps in less-resourced languages (e.g. dependency parsing).
Practical. Building the necessary infrastructure for recognizing and analyzing time, factuality, and opinion information in Catalan and Spanish text, which will be incorporated as an active component in the information extraction system developed for these two languages at the hosting institution. It involves the following tasks:
Task 2.a Setting the description model for each system (in CA & SP).
Task 2.b Creating corpora annotated with time, factuality, and opinion information (CA& SP).
Task 2.c Building event recognizers (for CA & SP).
Task 2.d Building analyzers of temporal information: time expressions extractor/normalize and temporal relations analyzers (for CA & SP).
Task 2.e Building factuality profilers (for CA & SP). Task 2.f Building opinion analyzers (for CA & SP).
These methodological and practical dimensions expand into the following objectives:
Objective 1. Defining description models for the systems of time, factuality and opinion as expressed in Catalan and Spanish, the 2 languages targeted by the project (Task 2.a)
Objective 2. Corpus building (Task 2.b)
Objective 3. Creating an event recognizer for Catalan and Spanish (Task 2.c) Objective 4. Creating time analyzers for Catalan and Spanish (Task 2.d)
Objective 5. Creating factuality and opinion analyzers for Catalan and Spanish (Tasks 2.e-f)
Objective 6. Final wrap-up of the system (Tasks 2.c-f)

The results of the current research were expected to include:
• The ARTiFactO system. A program that recognizes events, sorts them along the temporal axis, and identifies the factuality degrees and opinions assigned to them by relevant sources.
• A set of description models (specification scheme and annotation guidelines) for the addressed systems on Catalan and Spanish, to contribute towards future cross-lingual standards.
• A set of corpora for the systems of time, factuality, and opinion in Catalan and Spanish.
• Analysis of techniques for cross-lingual transport of lexicons and grammars, taking into account language pairs with different degrees of similarity (e.g. CA–SP, and SP–EN).
• Analysis of NLP techniques and tools to be used as alternatives in the case of less-resourced languages (for example, the case of dependency parsing).

These achievements were expected to benefit:
• The host institution, by enhancing its information extraction (IE) systems.
• Research on semantic IE, regarding topics such as: designing adequate cross-lingual descriptive models for NLP applications, linking equivalent corpus resources, expanding NLP technology to other languages.
• The EU as a multilingual community. Investigations on techniques for building NLP resources through cross-lingual porting help obviate linguistic borders, enhance communication among different linguistic communities, and ultimately contribute towards granting information access to all citizens.