European Commission logo
English English
CORDIS - EU research results

Cross-Linguistic statistical inference using hierarchical Bayesian models

Project description

Mathematical methods expose novel relationships driving language change and evolution

As early as Darwin, the "curious parallels" between biological and linguistic evolution were clear. Phylogenetic trees showing diversification and spread as descended from common "ancestors" are just as relevant to modern linguistics studies as they are to biology. Similarly, computational and statistical methods are applied to gain insight and deepen understanding regarding the sources of patterns of linguistic variations across languages (cross-linguistic variation). The EU-funded CrossLingference project is applying well-established mathematical, statistical, modelling and simulation methods to cross-linguistic variation with the goal of significantly enhancing our ability to explain relationships and drivers of change.


Historical linguistics and linguistic typology share the objective of explaining cross-linguistic variation. Their traditional research agendas have been largely disjoint though since historical linguistics strives for depth and typology for breadth. This tension has been replicated in current statistical and computational renderings of two sub-disciplines. Computational models of language change generally focus on individual language families, while statistical typology pays little attention to diachronic processes. CrossLingference will bridge this gap. Using Bayesian hierarchical models, the reach of modern phylogenetic linguistics will be extended to cross-family models, where each lineage is assumed to follow its own dynamics, but cross-family variation is constrained and data from one family are used to make inference about the processes in other families. At the same time, state-of-the-art generalized linear mixed models will be extended to control both for genealogical history and language contact. These model-based approaches will be complemented by agent-based simulations.
CrossLingference will implement this general programme for the following domains of application, securing a lasting impact both on statistical typology and on computational historical linguistics:
- Sound laws in language change, enabling automatic reconstruction of proto-language vocabulary,
- Causal relationships between typological variables.
- Factoring of universal tendencies, historical contingencies and language contact in explaining variation in
word-order types and inflectional paradigms.

Host institution

Net EU contribution
€ 2 500 000,00
72074 Tuebingen

See on map

Baden-Württemberg Tübingen Tübingen, Landkreis
Activity type
Higher or Secondary Education Establishments
Total cost
€ 2 500 000,00

Beneficiaries (1)