Computational Literary Studies Infrastructure

Project Information

CLS INFRA

Grant agreement ID: 101004984

DOI

10.3030/101004984

Project closed

EC signature date 12 March 2021

Start date 1 March 2021

End date 31 August 2025

Funded under

EXCELLENT SCIENCE - Research Infrastructures

Total cost

€ 4 999 942,50

EU contribution

€ 4 999 941,00

4 999 941,00

1,50

Coordinated by

INSTYTUT JEZYKA POLSKIEGO POLSKIEJ AKADEMII NAUK
Poland

Periodic Reporting for period 3 - CLS INFRA (Computational Literary Studies Infrastructure)

Reporting period: 2024-03-01 to 2025-08-31

The overall aim of the CLS INFRA, an Integrating Activities for Starting Communities (IASC) project, was to create a unified and easy access to the best European and national infrastructures for the CLS community which previously had not been fully supported to benefit from the existing infrastructures and data resources.

The project therefore aimed to consolidate, integrate and further develop institutional, national and regional efforts to build shared and sustainable access to the high-quality data, tools and knowledge in the field of literary studies, in general, and Computational Literary Studies (CLS), in particular. CLS INFRA used a balanced set of Networking, Joint Research, and Transnational Access activities to instigate the transformation of the CLS community from one based on informal exchanges of knowledge and resources to one based on a shared infrastructure, while, at the same time, creating the necessary conditions for the wider adoption of digital technologies in traditional literary studies and other related disciplines. To achieve its aims, CLS INFRA pursued the following specific objectives:

- Bridging knowledge-based resources for CLS community.

- Mapping and matching specific requirements of CLS community.

- Developing new tools and services for CLS users.

- Mainstreaming of new tools/services.

- Strengthening culture of cooperation.

The project’s deliverables—ranging from data catalogues and methodological reports to new tools, workshops, and training materials—have laid the groundwork for long-term integration with major European infrastructures such as DARIAH and CLARIN. CLS INFRA thus achieved its overall aim of enabling open, FAIR, and sustainable access to Europe’s literary heritage and equipping researchers with the digital competencies needed to explore it.

The CLS INFRA project divided the work on its objectives into nine work packages:

Work Package 1 revolved around the project’s management and coordination. WP1 delivered two updated versions of the Data Management Plan (D1.1).

Work Package 2 was responsible for the project’s communication and dissemination. Apart from disseminating the project's activities on social media, WP2 prepared a Communication Plan that described communication and dissemination strategies for the CLS INFRA project.

The objective of WP3 was to identify, document and showcase current shared practices in CLS research. To this end, WP3 published Baseline Methodological User Needs Analysis (D3.1) explored new means of dissemination by publishing an interactive Survey of Methods (D3.2) and completed a series of survey papers on methodological concerns (D3.3).

WP4 completed the Skills Support Gap Analysis (D4.1). The main task of this deliverable was to explore current gaps in the teaching of research skills for computational literary studies. WP4 also organised two Training Schools and two workshops in CLS.

WP5 focused its work on documenting the state of literary data. The landscape review (D5.1) focused on intellectual access—providing guidance for finding and sharing literary data—and consisted of collecting and analysing literary corpora, available formats, tools, and metadata. Subsequent deliverables included Case Studies in Data Preparation and Sharing (D5.2) and Toolkit Report for Data Sharing (D5.3).

The main objective of WP6, was to create a catalogue of existing literary corpora in Europe. The resulting inventory (D6.1) informed the development of the data model underlying the catalogue. Deliverable Extended Transformation Matrix Including Alternative Formats (D6.3) was also completed and published.

The work in WP7 focused on the conceptualisation and technical prototyping of a Programmable Corpus. The work was carried out on the development of a domain-specific ontology for transnational drama corpora, which was tested under the name DraCorOn. Reports on programmable corpora (D7.1) on versioning the living corpora (D7.3) and a set of tools in R and Python to query DraCor (D7.2) were published.

WP8 focused its efforts on optimising the availability of fundamental NLP tools within a workflow for literary texts. Report of the Tools (D8.1) and Report on Annotation as Enrichment (D8.2) were published.

WP9 was related to the management and oversight of the TNA selection process, and revolved around the recruitment and administration of the activities of the External Advisory Board. It completed two calls for TNA research stays, with 31 successful applicants and 187 weeks awarded.

CLS INFRA has decisively advanced the field of Computational Literary Studies by creating the first integrated European infrastructure dedicated to literary data, tools, and research workflows. Before this project, the CLS landscape was fragmented: tools and corpora were dispersed across national initiatives, often lacking interoperability and accessibility. CLS INFRA overcame these challenges by unifying previously isolated resources and establishing common technical and methodological standards aligned with the FAIR and Open Science principles.

CLS INFRA’s innovations have redefined best practices in data management and computational research in the humanities. By producing open, interoperable, and reusable datasets, the project has made it possible for researchers to perform large-scale, multilingual, and cross-genre analyses that were previously unfeasible. Its technical developments have been adopted or cited by other infrastructure initiatives, amplifying its influence beyond the CLS domain.

The project has strengthened interdisciplinary collaboration between literary scholars, linguists, data scientists, and computer scientists—bridging communities that traditionally operated in parallel. This interdisciplinary infrastructure lays the foundation for the next generation of research on European cultural heritage.

The impact of CLS INFRA's Training Schools cannot be underestimated. They offered young researchers of different discipline backgrounds crash courses in essential skills needed for textual analysis.

The progress in developing, maintaining and testing the methods/tools/workflows should also be mentioned. CLS INFRA thus pushed the state of the art as it seeked to develop multilingual toolchains for scholars working on historical literary materials. All these tools are be available for free and open use.

Also, the project’s training ecosystem and TNA programme transformed community practice by embedding digital methods in the core of literary research training, thus institutionalizing computational approaches within traditional humanities frameworks.

Finally, CLS INFRA contributes to the preservation, accessibility, and understanding of Europe’s literary and linguistic diversity. By making literary data available in a structured, multilingual, and reusable form, it enables new comparative and cross-cultural analyses that strengthen Europe’s cultural cohesion and historical self-understanding. The project’s emphasis on Open Science and inclusive participation also advances democratic access to knowledge and reinforces the public value of humanities research.

Moreover, by engaging with non-academic user communities (identified in WP3), CLS INFRA has broadened the relevance of literary data and methodologies to domains such as policy analysis, education, and cultural mediation—thus ensuring that its societal benefits extend well beyond the academic sphere.

CLS INFRA official logo

Periodic Reporting for period 3 - CLS INFRA (Computational Literary Studies Infrastructure)

Download Download the content of the page