Skip to main content

The Tocharian Trek: A linguistic reconstruction of the migration of the Tocharians from Europe to China

Periodic Reporting for period 2 - TheTocharianTrek (The Tocharian Trek: A linguistic reconstruction of the migration of the Tocharians from Europe to China)

Reporting period: 2019-10-01 to 2021-03-31

The long trek of the Tocharians from Europe to China is one of the most disputed issues in the migration history of Eurasia. Tocharian is an extinct branch of the Indo‐European language family, which includes a.o. English, Latin, Greek and Sanskrit. The Indo‐European languages stretch in one uninterrupted belt from Ireland to the Sea of Bengal, but Tocharian, discovered in manuscripts from the Tarim Basin in Northwest China dating from c. 500–1000 CE, is a notorious exception to this geographic distribution.

The common ancestor of the Indo‐European languages, Proto‐Indo‐European, can be hypothetically reconstructed and is often located in the east of present‐day Ukraine. Therefore, speakers of early Tocharian must have made a long trek eastward before they settled in the Tarim Basin. Archaeological and genetic evidence suggests that they first moved east to southern Siberia around 3500 BCE and then south to the Tarim Basin in China, where they may have arrived as early as 2000 BCE. The arrival of the Tocharians in the Tarim Basin is possibly linked to ancient corpses found there: the so‐called Tarim Mummies.

Curiously, linguistic evidence has mostly been neglected. Therefore, the proposed project aims to provide an integrated linguistic assessment of the hypothesised migration route of the Tocharians.

Languages preserve precious information about their prehistory through the effects of language contact. Through close scrutiny and periodisation of the different layers of contact of Tocharian and its prehistoric neighbours, the project will reconstruct the migration route of the Tocharians from the Proto‐Indo‐European homeland all the way to China.

The crucial Siberian phase of the migration shows the groundbreaking nature of the approach. Genetic evidence points to influence from local Siberian populations on the early Tocharians. Likewise, the Tocharian language shows such heavy impact of local Siberian languages that this may be called the “birth of Tocharian”.
The linguistic assessment of The Tocharian Trek makes use of the following linguistic approaches: 1) phylogeny; 2) language contact; 3) linguistic palaeontology.

In the phylogeny approach, the position of Tocharian in the Indo-European language family is investigated: when did the Tocharian branch split off the protolanguage, and are there any closer connections to other branches? A relatively early split-off of Tocharian seems needed in view of the archaeological evidence from southern Siberia, which starts off early compared to archaeological cultures associated with Proto-Indo-European. The phylogenetic position of Tocharian within Indo-European is investigated by Louise Friis MA ( who is focusing on morphological evidence, and by Stefan Norbruis (almost) PhD ( who is focusing on lexical evidence.

In the language contact approach, the prehistory of the Tocharian language is investigated on the basis of its contacts with Uralic, Turkic, Chinese and Niya Prakrit. Contacts with Uralic appear to have taken place in southern Siberia, originally home to the Samoyedic branch of the Uralic language family, which provides the crucial link between the Tocharian language and southern Siberia. Impact of Uralic has most probably caused drastic changes in the sound system and nominal morphology of Tocharian. Contact between Tocharian and Uralic is investigated by Abel Warries MA ( In the Tarim Basin, Tocharian has been claimed to have influenced the Middle Indian language Niya Prakrit or Niya Gāndhārī. This claim, and foreign influence on Niya Prakrit in general, is investigated by Niels Schoubben MA ( Contacts between Tocharian and Chinese and Turkic are investigated by the Principal Investigator.

Tocharian has also been in heavy contact with Iranian languages. These are not the topic of "The Tocharian Trek", but of a related project, "Tracking the Tocharians", funded by the Dutch Science Fund (NWO; In this project Chams Bernard MA ( and Federico Dragoni MA ( investigate the contacts between Tocharian and Old Iranian and Middle Iranian languages.
The Tocharian Trek aims to write, and partly rewrite, the linguistic prehistory of the Tarim Basin in Northwest China and southern Central Siberia.

To date, the most important results are the following:

Peyrot, Michaël. 2019. "The deviant typological profile of the Tocharian branch of Indo-European may be due to Uralic substrate influence". Indo-European Linguistics 7: 72–121.

Dragoni, Federico, Niels Schoubben & Michaël Peyrot. 2020. "The Formal Kharoṣṭhī script from the Northern Tarim Basin in Northwest China may write an Iranian language". Acta Orientalia Academiae Scientiarum Hungaricae 73: 335–373.

Sikora, Martin et al. 2019. "The population history of northeastern Siberia since the Pleistocene". Nature 570: 182–188.