Forests and Trees: the Formal Semantics of Collective Categorization

Reporting period: 2019-05-01 to 2020-10-31

Reporting period: 2019-05-01 to 2020-10-31

Languages have various ways of referring to collections like families, herds and forests. The grammatical properties of collective expressions critically determine how we understand them. The sentences “this forest is old” and “these trees are old” categorize an arboreal collection using a concept (“old”), while conveying different meanings. This semantic difference correlates with the difference in grammatical number between the sentences: singular vs. plural. Such effects of *collective categorization* in language are crucial for understanding the connections between grammar and the mind, as well as for artificial intelligence. However, currently little is known about the mechanisms underlying our linguistic ability to conceptualize collections. This project aims to develop a novel linguistic theory of this ability, applied to a wide range of empirical phenomena and interdisciplinary challenges in computational semantics and comparative linguistics, benefiting from the recent synergy between linguistics and the psychology of concepts. The idea is that when classifying a collection, speakers rely on two inferential principles with mental concepts: (i) geometric inferences: a forest is considered “far away” if all of its trees are far; (ii) symmetric inferences: two trees are “similar” if each of them is similar to the other. The leading hypothesis is that uniform interactions between these inferential principles and the grammar of collective expressions account for collective categorization in language. This hypothesis is explored in three work packages, each of which develops the semantic theory and evaluates it on a different interdisciplinary domain: human interaction with geographic information systems, behavioral linguistic experiments, and comparative linguistics. Together, the three components of the project are expected to lead to a theoretical breakthrough in semantic theory and to enrich its interdisciplinary connections with neighboring disciplines.
The project has developed a computational engine that uses geographic information to generates linguistic descriptions of images. This system is currently being tested in comparison to other caption generation systems that do not use geographic information. So far, it seems that when geographic information is available, using it can substantially improve linguistic description of images for practical purposes.

Two major experiments (about 500 participants each) were carried out to test symmetric inferences with collective expressions. With each of the two major collective constructions that have been tested, the results have shown that symmetric inferences are preferential but not obligatory. This is a novel finding, which informs all theories of lexical knowledge about collective categorization.

An extensive comparative study has been curried out between collective-reciprocal sentences in English and some Romance languages (Italian, Spanish, Catalan and Brazilian Portuguese). The results show that although collective-reciprocity is encoded differently on surface forms, its semantic behavior in English and Romance languages is surprisingly similar.

The study of acceptability and truth with collective expressions has also led to a systematic general revision of current semantic theories of presupposition - what a speaker assumes to be true as a pre-condition for her utterance.
The four results mentioned above mark progress beyond state of the art: a caption generation system that profitably uses geographic information for linguistic description, an experimental characterization of symmetry and collectivity that reveals new facts about them, a new finding on collectivity in Romance language, and a new theory of presuppositions.

These four achievements are expected to be strengthened until the end of the project:
1- the caption generation system will be further improved and evaluated;
2- the experimental results will be extended to other constructions, will be theoretically grounded and reported in a series of publications;
3- the cross-linguistic characterization of reciprocity will be extended to languages from non-European families and to other linguistic strategies;
4- the new theory of presupposition will be extended further and wrapped up in a series of publications.