European Commission logo
français français
CORDIS - Résultats de la recherche de l’UE
CORDIS

Quantitative modeling of historical-comparative linguistics: Unraveling the phylogeny of native South American languages

Final Report Summary - QUANTHISTLING (Quantitative modeling of historical-comparative linguistics: Unraveling the phylogeny of native South American languages)

The ERC-Project "Quantitative modeling of historical-comparative linguistics" has redefined how historical-comparative linguistic research is performed in practice. Historical-comparative linguistics is a field of research with a long tradition (dating back in its modern form at least 200 years), and has achieved great results, like the reconstruction of the original Indo-European language spoken about 6000-8000 years ago. However, because of its long tradition with excellent results, this field of research is currently hampered by methodological innovation.

Starting from the beginning of the second millenium, historical linguistics experienced a quantitative turn, reflected in a great number of literature on such different topics as phonetic alignment, automatic detection of historically related words, or phylogenetic reconstruction. Most of these computational methods, however, were not introduced by historical linguistics themselves, but by scholars from other scientific branches, like biologists or mathematicians. This created a gap between the “new and innovative” quantitative methods and the traditional approaches. Traditional historical linguists are often very skeptical of the new approaches, partly because the results are not always in concordance with those achieved by the traditional methods, partly because many of the new approaches are based on large datasets which often exhibit numerous errors. Quantitative historical linguists, on the other hand, complain about traditional historical linguists' lack of interest in the multiple opportunities which quantitative and digital approaches have to offer.

There are many good reasons to formalize and digitize the current practice of historical linguistics. The drawback of the successful tradition is that it depends on highly laborious manual interpretation on the basis of often decade-long individual dedication, which not many scholars are still willing to invest today (and the current scholarly funding situation often does not accommodate for long-term research anymore). The aim of the project "Quantitative modeling of historical-comparative linguistics" was to develop computational methods to assist scholars in their work, and make it less time-consuming. At the same time, new computer-affine scholars may be attracted to this field. A central achievement of the project is the preparation of numerous open-source computational tools that could have a standing impact on the scientific work in historical linguistics. Here, the main focus of the project was to bridge the gap between traditional and quantitative historical linguistics, especially by offering tools which would encourage scholars to produce data which are both computer- and human-readable.

Using and refining our own toolset, we have performed historical-comparative analyses on various groups of languages and on various datasets, often digitizing the available knowledge on the respective languages, as this knowledge is often still only available in printed form (another barrier for computer-assisted methodologies).