European Commission logo
italiano italiano
CORDIS - Risultati della ricerca dell’UE
CORDIS

Machine Learning for the Study of Ancient Epigraphic Cultures

Periodic Reporting for period 1 - PythiaPlus (Machine Learning for the Study of Ancient Epigraphic Cultures)

Periodo di rendicontazione: 2021-11-15 al 2023-11-14

The PythiaPlus research project aims to investigate ancient Mediterranean written (epigraphic) cultures using Machine Learning (ML), revolutionising our ability to access, analyse and interpret the epigraphic data. State-of-the-art ML models will be built to analyse Greek and Roman epigraphic habits on an unprecedented large scale, revealing new insights in linguistic and cultural interactions.

Context
Computational approaches have come to feature prominently in the Humanities, thus defining a unique opportunity to write an interdisciplinary history of the Graeco-Roman world in the Digital Age. Specifically, ML's transformative role in data-driven research can impact how historical data is collected, analysed, and interpreted.
Inscribed texts (inscriptions) are primary evidence for reconstructing the history and thought of the Ancient World. ML models could now reveal patterns in this data which historians were previously unable to identify in such detail and on such scale: PythiaPlus will enable the first "big data" study of epigraphic cultures over circa 1,500 years of ancient Mediterranean history using ML.

Objectives
a) Develop educational and research tools for tracking textual connections and making machine-readable data accessible for future research.
b) Advance contextualisation of written evidence and reconstruction of ancient epigraphic habits.
c) Pioneer a machine learning approach to analysing textual material cultures, applying technological advances to ancient inscriptions.

Action plan
1) Dataset building: Gather, sample and prepare the epigraphic digital data to be used by ML models.
2) Model training: Train models, evaluate performance, and tune parameters for improved statistical performance and explainability.
3) Result interpretation: Interpret new patterns discovered by models in line with scholarly approaches to distinctive epigraphic habits.

Impact
The PythiaPlus project has introduced cutting-edge ML tools for analysing ancient Greek inscriptions, focused on collaboration and interpretability. It significantly advances the burgeoning field of ML in the study of ancient languages, meticulously documented as part of the project's outcomes. Through the development of real-world epigraphic case studies, PythiaPlus unlocks the cooperative potential between Artificial Intelligence and Ancient History. Additionally, it implements a robust communication strategy and addresses its integration in the education and industry sectors.
1) ITHACA: Creation of the ML model ‘Ithaca’
Ancient texts may be damaged to the point of illegibility, their place and date of writing uncertain, posing challenges to experts studying these valuable texts. Ithaca is the first deep neural network for the textual restoration, geographical attribution and chronological attribution of ancient Greek inscriptions. Ithaca is designed to assist and expand the historian’s workflow. While Ithaca alone achieves 62% accuracy when restoring damaged texts, the use of Ithaca by historians improved their accuracy from 25% to 72%, confirming the synergistic effect of this research tool. Ithaca can attribute inscriptions to their original location with an accuracy of 71% and can date them to less than 30 years of their ground-truth ranges.

This work was published on the cover of the journal Nature, and received widespread media attention. We released a free online interface for scholars to use Ithaca for their own research, and facilitated Ithaca’s adoption as a teaching aid in European classrooms. We also open-sourced the pretrained model and its training weights and dataset to facilitate future work.

2) SURVEY: Survey the state-of-the-art in ML for ancient languages
The study of ancient texts and languages is fraught with difficulties, and experts must tackle a range of challenging text-based tasks. We provide a comprehensive survey of published research using ML for the study of ancient texts written in any language, script, and medium across the ancient world. The survey offers three major contributions: a) mapping the interdisciplinary field carved out by the synergy between the Humanities and ML; b) highlighting how active collaboration between specialists from both fields is key to producing impactful and compelling scholarship; c) indicating promising directions for future work in this field. Our discussion extended to issues regarding ancient data bias, the risks of digital colonialism, the need for standardised metrics and datasets.

This work was published in Computational Linguistics, and is set to be the point of reference for future work in this flourishing field. We created an open repository, serving as a platform to host the taxonomy of the research works reviewed.

3) REAL-WORD EPIGRAPHY: analysis of several epigraphic case studies.
I have studied the inscriptions from several sites in central - western Sicily during the archaic-classical periods. This work was published in specialist journals on ancient Sicily.
Due to the strong interdisciplinary nature of the PythiaPlus project, strategic cross-sectoral collaborations were successfully implemented throughout my MSCA with Google DeepMind, the University of Oxford, the Athens University of Economics and Business, Brown University, Google Cloud, Google Arts & Culture, the University of Vienna, and the Soprintendenza Archeologica di Palermo.
I have also collaborated with 6 EU-funded projects (ERC and MSCA funded initiatives).

These collaborations were effectively nurtured through my secondments, networking opportunities with key stakeholders and through the engagement of key opinion formers at dissemination events. In so doing, the PythiaPlus project has progressed beyond the state of the art in terms of the impact and significance of its delivered results:

- Collaborative articles, including publications in Nature and Computational Linguistics, garnered significant attention, with the Ithaca article featured on the cover of Nature accumulating 89,000 views and 92 citations to date.

- Digital outputs, comprising datasets, taxonomies, and interfaces, were strategically disseminated through open-source channels. The Ithaca interface, receiving approximately 300 unique queries weekly, demonstrates the project's impact.

- Academic contributions encompassed 3 conferences, 20 invited lectures and keynotes, along with the organisation of 11 outreach events and the Ithaca launch event in collaboration with the Epigraphic Museum of Athens and the National Hellenic Research Foundation.

- The project garnered international media coverage from outlets such as The Times, El Paìs, The Guardian, MIT Technology Review, La Repubblica, Financial Times, Wired, New Scientist, Kathimerini, resulting in 83 public engagement activities (interviews, press releases, blog posts, podcasts, TV appearances). Promotional videos by Nature and Google DeepMind amassed over 250,000 views.

- Integration of the Ithaca project into European school curricula by over 80 teachers exemplifies its broader impact, bridging the gap between Computer Science and Humanities disciplines through AI tools.
Nature cover March 2022 featuring the Ithaca project