Skip to main content
Vai all'homepage della Commissione europea (si apre in una nuova finestra)
italiano italiano
CORDIS - Risultati della ricerca dell’UE
CORDIS

Forests and Trees: the Formal Semantics of Collective Categorization

Periodic Reporting for period 4 - ROCKY (Forests and Trees: the Formal Semantics of Collective Categorization)

Periodo di rendicontazione: 2022-05-01 al 2023-07-31

Languages have various ways of referring to collections like families, herds and forests. The grammatical properties of collective expressions critically determine how we understand them. The sentences “this forest is old” and “these trees are old” categorize an arboreal collection using a concept (“old”), while conveying different meanings. This semantic difference correlates with the difference in grammatical number between the sentences: singular vs. plural. Such effects of *collective categorization* in language are crucial for understanding the connections between grammar and the mind, as well as for artificial intelligence. This project aims to develop a novel linguistic theory of this ability, applied to a wide range of empirical phenomena and interdisciplinary challenges in computational semantics and comparative linguistics. The project benefited from the recent synergy between linguistics and the psychology of concepts. The main idea is that when classifying a collection, speakers rely on two inferential principles with mental concepts: (i) geometric inferences: a forest is considered “far away” if all of its trees are far; (ii) symmetric inferences: two trees are “similar” if each of them is similar to the other. The leading hypothesis is that uniform interactions between these inferential principles and the grammar of collective expressions account for collective categorization in language. This hypothesis was explored in three work packages. Each work package has developed the semantic theory and evaluated it on a different interdisciplinary domain: human interaction with geographic data, behavioral linguistic experiments, and comparative linguistics. Together, the three components of the project have led to substantial theoretical developments in semantic theory and enriched its interdisciplinary connections with neighboring disciplines.
The project has developed a computational engine that uses geographic information to generates linguistic descriptions of images. This system was compared to other caption generation systems that do not use geographic information. The comparison showed that when geographic information is used for language generation, it substantially improves the quality of linguistic descriptions of images for practical purposes.

Two large-scale experiments (about 500 participants each) were carried out to test symmetric inferences with collective expressions. With each of the two major collective constructions that were tested, the results showed that symmetric inferences are preferential but not obligatory. This is a novel finding, which informs all theories of collective categorization.

An extensive comparative study has been curried out between collective-reciprocal sentences in English and four Romance languages (Italian, Spanish, Catalan and Brazilian Portuguese), as well as two major African languages (Swahili and Wolof). The results show that although collective-reciprocity is encoded differently on surface forms, its semantic behavior across different language families is surprisingly similar.

The study of acceptability and truth with collective expressions has also led to a systematic general revision of current semantic theories of presupposition - what a speaker assumes to be true as a precondition for her utterance. An additional development has been a clear demarcation of the semantic differences between count nouns (e.g. "bags") that refer to objects and corresponding mass terms that refer to their collections (e.g. "baggage").
The five results mentioned above all mark progress beyond state of the art: a caption generation system that profitably uses geographic information for linguistic description, an experimental characterization of symmetry and collectivity that reveals new facts about them, a new finding on collectivity in different languages, a new theory of presuppositions, and a clearer understanding of the mass-count distinction.

The computational methods that were developed in the ROCKY project are currently been exploited for medical data (ultrasound photos and their description) within a Proof-of-Concept followup project (ROCAP). The results will be disseminated as a stand-alone system for image captioning in this medical context. Data that have been acquired on spatial reasoning is curated in an open-access database. Experimental software that were developed is also available as an open-access tool.
cropped-img-1566.jpg
Il mio fascicolo 0 0