CORDIS - Forschungsergebnisse der EU
CORDIS

Exploring Cultural Memory in the Pre-Modern Islamic World (700–1500): Knowledge, Information Technology, and the Arabic Book

Periodic Reporting for period 3 - KITAB (Exploring Cultural Memory in the Pre-Modern Islamic World (700–1500):Knowledge, Information Technology, and the Arabic Book)

Berichtszeitraum: 2021-05-01 bis 2022-10-31

One of the most spectacularly prolific traditions in human history, the medieval Islamic world witnessed a literary outpouring surpassing the textual output of classical Antiquity and medieval Europe. The goal of the KITAB project is to develop and use technology to study the history of the Arabic book (700–1500) and its role in shaping cultural memory in the Islamic world. Authors in this period frequently reused earlier works, repurposing them to suit an evolving present and imagined future. We now can detect such reuse across a corpus of 1.75 billion words. Whereas previously the history of the Arabic book proceeded mostly along anecdotal lines, we can now see major patterns, including for example, the role that a small group of highly prolific authors played in preserving and passing on many earlier works.

Through our work, we are advocating generally for a new way of working on the history and historiography of the Middle East and, by analogy, any large historical textual tradition. The way involves computational textual analysis of corpora, or collections of texts, that are vetted and prepared by scholars and readable by a computer. It specifically involves using algorithms to detect common passages between pairs of such texts and then finding patterns. Done forensically and critically, academics can aspire and work towards creating a map that represents the intertextuality of any entire, surviving written textual tradition.
The first 30 months of the project have involved, firstly, development of our corpus of texts and technical infrastructure. We now have a corpus of 1.75 billion words, which we have built up significantly since starting the project in 2018 (with nearly 40% increase in titles from repositories previously unknown to us). Our work on curation is important for developing an emerging field of Arabic/Islamicate Digital Humanities. So far, it has involved, on a subset of 850 works, metadata verification, comparison of machine-readable files with print versions of the texts, edition quality assessment, and recording of issues. We have annotated these works, so that their editorial structure can be easily read, displayed, and analysed, and we have begun to add metadata layers, pertaining to genre and geography.

The creation of the corpus represents an important step for the field. A team member is focused on mapping the corpus to printed editions and manuscripts in catalogues today. This mapping is important for considerations of cultural memory, for understanding what is in our corpus and what is not, and indeed, for undertsanding any historical text – whether for digital or non-digitally informed research. That is because it puts into sharp focus the partiality of our access to the past, and helps to show how later periods profoundly mediated our access to earlier ones.

In terms of technical method and platform, we have adapted for Arabic a text reuse algorithm called passim, which finds and aligns common parts of texts from across the corpus. We have also developed machine learning methods to automatically detect transmissive chains (isnads), a characteristic feature of Arabic texts. We are working to produce a platform that will feature applications allowing researchers to access and interact with our data. Currently Arabic does not enjoy the same functionality as English and European languages when it comes to text readers. Our work on text reuse, search, and named entity recognition is important for our research goals (understanding how texts were copied and recopied across time) but also relevant for reading interfaces. Through complementary funding, we have created a beta-version text reader in Arabic that uses our data.

In terms of book history, our ongoing case studies focus on authorial practices, changing forms of the book, and narrative adaptations. Team members have given 43 research presentations, published 22 blogs, and have 5 papers submitted for publication that relate to book history.
There are at least 4 books planned that rely on our data, the first of which will be submitted in 2021, and is written by the PI. It will accompany a release and explanation (with documentation) of data underpinning the project. The book provides a schematic definition of text reuse, analysis of the corpus and its intertextuality, and in-depth discussion of authorial practices among a subset of authors who each produced a work of a million words of more. The book compares authors’ working practices, detected through the passim algorithm, to what authors say they are doing. Because reuse is so extensive, and manners of citation so complex and varied, such an analysis profits enormously from digital methods.

A second book, co-written by the project's research team (including the PI), explores the working of cultural memory. We work both with the dataset as a whole (including analysis of the corpus itself) and with case studies that link books and authorial practices to specific memory communities.

Additional books, book chapters, and articles are based on the research team's own case studies. These rely on our corpus, data, and methods, and go into greater detail on particular case studies.

A key outcome of the project will be not just our scientific findings, but a community of scholars working in Arabic who want to contribute to, and use, our corpus, data, and methods. The past 5 years have witnessed major advances in machine learning. Because of our ERC funding, the KITAB project has been able to partner with computer scientists to advance machine learning for historical Arabic, build the corpus, and initiate networks that can further develop after the project ends. We have begun working closely with a user group of external, early career scholars who we are training to access and contribute to our work. Both the humanities scholars and computer scientists profit by working together, and a key achievement will be working out how to create such successful partnerships. We are now also advising and working closely with scholars working in Persian, who wish to undertake work similar to ours.