Periodic Reporting for period 1 - InterText (Modelling Text as a Living Object in Cross-Document Context)
Reporting period: 2023-04-01 to 2025-09-30
InterText breaks new ground by proposing the first framework for the computational study of intertextuality in the age of LLMs. We shift the NLP paradigm from the analysis of isolated texts to the analysis of evolving digital documents, interpreted in context, and explore the next frontier in LLM research – understanding long documents in context. We advance NLP for living texts along three dimensions: linking investigates the relationships between related texts; versioning focuses on the relationships between texts and their revisions; implicit commentary studies the texts in relation to annotations made on top of them. We create foundational datasets and models for the computational analysis of these relationships, and develop robust formalisms and modular representations to incorporate cross-document context into natural language processing. To anchor our work in real-life applications, we apply our findings in two critical domains: academic peer review and conspiracy theory debunking. The ground-breaking research of InterText creates a foundational platform for intertextuality-aware NLP, crucial for managing the dynamic, interconnected digital discourse of today.