Skip to main content
Aller à la page d’accueil de la Commission européenne (s’ouvre dans une nouvelle fenêtre)
français français
CORDIS - Résultats de la recherche de l’UE
CORDIS

Digital Grammar of Greek Documentary Papyri

Periodic Reporting for period 4 - PapyGreek (Digital Grammar of Greek Documentary Papyri)

Période du rapport: 2022-09-01 au 2023-08-31

The project created a new Digital Grammar of Greek Documentary Papyri. It fills a void in Greek scholarship: the papyrological corpus represents the Post-Classical variety of Greek, a bridge between Classical and Medieval Greek, which has hitherto been very difficult to use as a source for studying historical linguistics. This project has developed new digital methods and tools for studying this fragmentary but vast text corpus.

Greek is a unique language for linguists in its chronological scope. Documentary Greek papyri, ranging from ca. 300 BCE to 700 CE, can be contrasted with literature: these papyri preserve the language as the ancient writer composed it and lead us close to the colloquial contemporary language. The nonstandard variation in documentary texts is where language change can first be detected, making the papyrological corpus an important source for the diachronic study of Greek. The new Grammar of Greek papyri can answer such questions as how much bilingualism affected Greek language in Egypt and when and where it was a dominant feature of the society. The papyri will partly be treated as big data; both phonological and morphological analyses are applied to the whole corpus of around 60,000 texts with the help of AI-generated morphological analyses. This will enable e.g. phonological analyses to be performed with greater accuracy than has been possible before by eliminating the confusion between inflectional morphology and phonological variation. The Greek language has a rich morphology, and several historical changes in the language affected the inflection. The phonological developments (both language-internal and language-external, i.e. contact-induced) were possible causes for the morphological merging of case endings and verbal inflection. This, in part, led to confusion of certain syntactic structures.

As a result, the Digital Grammar is bringing the language used in the Greek papyri openly available to the scholarly community in an unforeseen manner. The Grammar includes new, more exact analyses of the phonology and morphology of Greek in Egypt, and it gives the users the possibility to search all levels of language, and phonology through editorial corrections, morphological forms, and syntactic relations as well as word order in combination or in separation. The syntactically annotated corpus forms a smaller but constantly expanding corpus of selected papyri, which yields a wider range of searches on morphosyntax.
The PI began working on March 2018 and the research group was established during the first year. The group consisted of postdoctoral researcher, Dr. Sonja Dahlgren, who worked on the project first 36 months, and at the end of the project four more months. The PhD candidate, MA Polina Yordanova, was hired in September 2018 and she worked on the project for the whole 66 months. The second planned postdoctoral scholar, Erik Henriksson, was hired from January 2019 onwards (as a PhD candidate at first and after obtaining the PhD in spring 2022 as a postdoctoral scholar) and he worked on the project for ca. 42 months in total (including some parental leave gaps etc.). Dahlgren was responsible for studying and describing the phonological level of Egyptian Greek and also deciphering features that are fruitful for the queries combined with the morphological data. She was in charge of writing the Grammar chapters on phonology. Yordanova’s field was word order in the noun phrase. She selected suitable papyrus archives for this study and annotated a large portion of them as well as built an external method for performing suitable queries from the corpus (Kiln for Treebanking). She has also contributed to writing our guidelines for annotation as well as teaching, reviewing and correcting the annotations of the student assistants together with the PI and writing the Grammar chapters on word order. Henriksson was in charge of the technical developments, preprocessing of the data, and design of the (front and back end of the) online portal and queries and he applied the AI-based BERT model for Ancient Greek to get the whole corpus morphologically tagged. Research assistants (MA students) have been employed on a short-term basis for treebanking (=performing morphosyntactic annotation) and building the corpus.

The first stable data release of PapyGreek Treebanks and its description in an OA article was published in 2021 and the latest version 3.0 was released in the fall of 2023. The treebanks have been used throughout the project’s duration in training the AI-based model for morphological tagger and also performing some smaller-scale studies on Greek language developments.

The project members have presented many preliminary studies on different linguistic aspects in conferences and publications. The project has organised workshops and online teaching events (e.g. on Epidoc and treebanking), a research seminar series (HelRAW), colloquia, a conference and public outreach events, some of these in collaboration with other research projects.
The main outcome of the project is the online portal Digital Grammar of Greek Documentary Papyri (openly available at https://papygreek.com/(s’ouvre dans une nouvelle fenêtre) replacing the earlier version https://papygreek.hum.helsinki.fi(s’ouvre dans une nouvelle fenêtre)). It is a combination of methodologies which makes it a unique entity. The interface has two sides, one public and the other for collaborators. The public side has three basic entities: the Treebanks, the Grammar and the Search. The Treebanks, i.e. the syntactically annotated data, can be downloaded but also browsed in different modes. The Grammar includes links to the data, example trees and saved searches letting the user make the queries we guide them to do and see the results, but also modify the queries to better represent his or her own interests. The Search can be used for the phonological (=orthographic), morphological and syntactic levels separately, but also in combination with these, which is a totally novel feature for querying Greek language materials. In addition to the orthographic and morphosyntactic features, the Search includes possibilities for lemma searches, word order and more complicated Regex queries. In addition, there is a possibility to focus the searches according to the metadata of the texts (concerning the place of origin, date, text type, writers and authors of the texts). The restricted side of the interface (for collaborators) includes the possibility to create treebanks for any chosen (papyrus) text; there is a review process that makes sure the submitted annotations are according to our guidelines and standards. The user can also add metadata concerning the text type and writers according to the preset fields. The writing of the Grammar chapters is also done within this side of the portal by using a Mark Down window; thus we can correct and add the chapters in a convenient manner.

We have shown that we can gain significantly more precise information on the diachronic, diatopic and diastratic developments of the Greek language by studying the linguistic variation which is available in the papyrological material when we adjust the existing digital corpus of documentary papyri so that it yields to computational linguistic methods and our cutting edge combination of tools for querying different levels of linguistic information.
screen-shot-2018-08-17-icagl-julistepuu.png
Mon livret 0 0