Periodic Reporting for period 4 - PapyGreek (Digital Grammar of Greek Documentary Papyri)
Período documentado: 2022-09-01 hasta 2023-08-31
Greek is a unique language for linguists in its chronological scope. Documentary Greek papyri, ranging from ca. 300 BCE to 700 CE, can be contrasted with literature: these papyri preserve the language as the ancient writer composed it and lead us close to the colloquial contemporary language. The nonstandard variation in documentary texts is where language change can first be detected, making the papyrological corpus an important source for the diachronic study of Greek. The new Grammar of Greek papyri can answer such questions as how much bilingualism affected Greek language in Egypt and when and where it was a dominant feature of the society. The papyri will partly be treated as big data; both phonological and morphological analyses are applied to the whole corpus of around 60,000 texts with the help of AI-generated morphological analyses. This will enable e.g. phonological analyses to be performed with greater accuracy than has been possible before by eliminating the confusion between inflectional morphology and phonological variation. The Greek language has a rich morphology, and several historical changes in the language affected the inflection. The phonological developments (both language-internal and language-external, i.e. contact-induced) were possible causes for the morphological merging of case endings and verbal inflection. This, in part, led to confusion of certain syntactic structures.
As a result, the Digital Grammar is bringing the language used in the Greek papyri openly available to the scholarly community in an unforeseen manner. The Grammar includes new, more exact analyses of the phonology and morphology of Greek in Egypt, and it gives the users the possibility to search all levels of language, and phonology through editorial corrections, morphological forms, and syntactic relations as well as word order in combination or in separation. The syntactically annotated corpus forms a smaller but constantly expanding corpus of selected papyri, which yields a wider range of searches on morphosyntax.
The first stable data release of PapyGreek Treebanks and its description in an OA article was published in 2021 and the latest version 3.0 was released in the fall of 2023. The treebanks have been used throughout the project’s duration in training the AI-based model for morphological tagger and also performing some smaller-scale studies on Greek language developments.
The project members have presented many preliminary studies on different linguistic aspects in conferences and publications. The project has organised workshops and online teaching events (e.g. on Epidoc and treebanking), a research seminar series (HelRAW), colloquia, a conference and public outreach events, some of these in collaboration with other research projects.
We have shown that we can gain significantly more precise information on the diachronic, diatopic and diastratic developments of the Greek language by studying the linguistic variation which is available in the papyrological material when we adjust the existing digital corpus of documentary papyri so that it yields to computational linguistic methods and our cutting edge combination of tools for querying different levels of linguistic information.