Modeling causes of language change and conservatism

Project Information

CAUSALITY

Grant agreement ID: 101042427

DOI

10.3030/101042427

EC signature date 2 February 2022

Start date 1 September 2022

End date 31 March 2028

Funded under

European Research Council (ERC)

Total cost

€ 1 172 500,00

EU contribution

€ 1 172 500,00

1 172 500,00

Coordinated by

UNIVERSITEIT GENT
Belgium

Periodic Reporting for period 1 - CAUSALITY (Modeling causes of language change and conservatism)

Reporting period: 2022-09-01 to 2025-02-28

Universally, human language changes over time. In that it is like a biological organism which keeps mutating and adapting itself to its environment, with some features rising in frequency and some declining. In the case of language, the origin of a change can be seen as a shift in the frequency of a given variant of linguistic expression (for instance, a given word order) in the speech of an individual. Why do such adaptations ever become necessary, that is, why do some features decline and other rise? And why do the adaptations take so long (often centuries), given that humans are extremely efficient learners? The goal of the CAUSALITY project is to investigate factors which push language change forward and those militating against it.
This investigation involves the retrieval of actual historical patterns of change from annotated databases, or treebanks, of Dutch, English, and Low German, as well as modeling, using Game Theory, the pragmatic reasoning the speakers engage in when evaluating their communicative choices. Modeling such reasoning is, by hypothesis, the key to uncovering causal relations between various grammatical changes. It is the communicative pressure that ensures that a decline of one feature (such as the informativeness of verbal endings) is "compensated" by an increase of an alternative feature (such as the frequency of pronominal subjects). By comparing the results of the game-theoretic simulations with the data retrieved from historical treebanks we can validate the communicative assumptions the models are based on. Finally, game-theoretic models will be incorporated into simulations involving multiple “generations” of artificial agents communicating with each other, to simulate a timeline and to pitch communicative efficiency against factors of group conservatism.
An important sub-project of CAUSALITY is creating a large treebank of historical Dutch. This will enable deeper insights into the evolution of Dutch and the way it gave rise to the modern dialectal landscape.

It is widely accepted that languages with “poor” verbal endings (such as English) enforce the use of pronominal subjects, while languages with “rich” endings, such as Italian, can do without. Until now measuring ending “richness” was based on examining textbook-like paradigms (1st person singular: ending X; 2nd person singular: ending Y, etc.) and checking whether for some cells X=Y. The reality of language use is, however, such that a particular paradigm cell can host more than one ending, which are not necessarily used with the same frequency (e.g. 1st person singular: ending X (30%) & Z (70%); 2nd person singular: ending Y (20%) & X (80%)). This means that the ambiguity of an ending is a scalar, rather than a categorical phenomenon. The Verbal Agreement Syncretism Score (VASS) we developed is an information-theoretic metric of ending ambiguity which considers all attested endings and their relative frequencies for a given subject type. Applying the VASS to historical treebanks of English has shown that the ambiguity of the verbal endings has been increasing (Figure 1), the VASS significantly correlating with the frequency of null subjects until the latter disappeared. We have also calculated the metric for historical stages of Old and Low German, discovering a cross-linguistic correlation.

Language speakers make use of various grammatical signals to relate what they are saying to the background information. Since such signalling strategies can change with time, we did a quantitative comparison based on historical corpora of English, Middle Low and High German and Icelandic, focusing on the evolution of the "prefield", a preverbal position hypothesized to host elements relating the interpretation to what has just been said. Even though Icelandic, in line with the traditional view, comes out as the most conservative system, there too do we find statistically significant shifts across time in the same direction as the other languages in the study.

An important subgroup of elements relating the meaning to the background information are determiners. We used statistical methods to examine their use in historical treebanks of English, coming to a novel conclusion that while the morphophonological profile of the determiner system has significantly changed over time, the semantic profile remained largely intact. This finding reveals a greater than previously assumed semantic stability of determiner systems.

An often discussed but so far unproven cause of word order changes are changes in the systems of morphological marking of the syntactic roles of verbal arguments, that is, changes in case marking. We performed the first large-scale quantitative investigation of the decline of morphological case marking in the documented history of English and French, as well as changes in the direct object placement in the two languages. We showed that the two processes are strongly correlated in both cases. Moreover, the empirical frequencies correlate with the frequencies obtained from game-theoretic simulations which are based on the assumption that morphological case and argument placement are two strategies of disambiguating syntactic structure and that historical phonological changes make the case marking option an increasingly “costlier” choice for the speakers.

Project results will be appearing on this page: https://research.flw.ugent.be/en/projects/causality

The VASS metric is a fundamentally novel approach to measuring verbal agreement syncretism. Up to now, the debate around the best way to qualify the so called “richness” of verbal agreement focused exclusively on static abstractions, such as paradigms typically given in manuals for language learning and in descriptive grammars. While such abstract schemes can be useful to detect ambiguity in categorical terms (e.g. the ending -e is found both in the context of the 1st and the 3rd person subject in present indicative), they a) usually contain no information about alternative endings appearing with a particular combination of subject features (e.g. endings -e and -s are both found in the context of the 1st and the 3rd person subjects in present indicative) and b) by definition, never contain information about how likely a particular ending is to be used with a particular subject type. However, parameters a) and b) is what determines the informativeness of a given ending for identifying the subject in the actual use, and this is what VASS measures. While the categorical qualification of a paradigm as “rich” or “poor” cannot be meaningfully related to the scale of pronominal subject frequencies, the VASS can, just as it can be compared to the VASS’ calculated for other languages and historical periods. This opens multiple new research directions.

The statistical approach to the Old English article semantics led to a breakthrough in the understanding of the article system of historical English. For the first time there is strong quantitative evidence that Old English had a definite article whose quantitative signature and, by hypothesis, semantics, is identical to that of its Present-Day-English counterpart. These results point to a much greater than previously assumed stability of article systems across time.

The project is also groundbreaking in identifying strong correlations between changes in morphological case expression and word orders, finally lending statistical support to an age-old intuition.

VASS

Periodic Reporting for period 1 - CAUSALITY (Modeling causes of language change and conservatism)

Download Download the content of the page