European Commission logo
English English
CORDIS - EU research results
CORDIS

Computational Literary Studies Infrastructure

Periodic Reporting for period 1 - CLS INFRA (Computational Literary Studies Infrastructure)

Reporting period: 2021-03-01 to 2022-08-31

The overall aim of the CLS INFRA, an Integrating Activities for Starting Communities (IASC) project, is to create a unified and easy access to the best European and national infrastructures for the CLS community which has not been fully supported to benefit from the existing infrastructures and data resources.

The project therefore aims to consolidate, integrate and further develop institutional, national and regional efforts to build shared and sustainable access to the high-quality data, tools and knowledge in the field of literary studies, in general, and Computational Literary Studies (CLS), in particular. CLS INFRA uses a balanced set of Networking, Joint Research, and Transnational Access activities to instigate the transformation of the CLS community from one based on informal exchanges of knowledge and resources to one based on a shared infrastructure, while, at the same time, creating the necessary conditions for the wider adoption of digital technologies in traditional literary studies and other related disciplines. To achieve its aims, CLS INFRA will pursue the following specific objectives:

Objective 1: Bridging knowledge-based resources for CLS community.

Objective 2: Mapping and matching specific requirements of CLS community.

Objective 3: Developing new tools and services for CLS users.

Objective 4: Mainstreaming of new tools/services.

Objective 5: Strengthening culture of cooperation.
The CLS INFRA project divides the work on its objectives into nine work packages, each focused on specific tasks. Following description covers their activities within this stage of the project.

Work Package 1 revolves around project’s management and coordination. WP1 delivered a first draft of the project Data Management Plan (Deliverable 1.1). Other deliverables include a document on personal data management and informed consent forms (D 10.1) as well as a protocol on processing personal data in accordance to GDPR regulations (D 10.2).

Work Package 2 has been responsible for the project’s communication and dissemination. Apart from setting up the project's website and disseminating the project's activities on social media, WP2 has prepared a detailed Communication Plan (Deliverable 2.1) that describes communication and dissemination strategies for the CLS INFRA project, and ensures that the results of CLS INFRA are exploited across all appropriate user communities.

The overarching objective of WP3 is to identify, document and show-case current shared practices in CLS research. To this end, WP3 has prepared a full-text corpus of CLS research publications, analysed this corpus with a focus on data formats, methods and tools, and published the report Baseline Methodological User Needs Analysis (Deliverable 3.1).

WP4 completed the Deliverable 4.1 Skills support gap analysis for the computational analysis of literary texts. The main task of this deliverable was to explore current gaps in teaching of research skills for computational literary studies to inform CLS INFRA project’s own approach to training schools and chart the territory to gain broader insight into current CLS teaching practices.

WP5 has focused its work on the deliverable D5.1 Review report documenting the state of literary data. The landscape review focuses on intellectual access, i.e. providing guidance for finding and sharing literary data, and consisting of collecting and analysing literary corpora, available formats, tools, and metadata.

In order to achieve one of the main objectives of WP6 – to create a catalogue of existing literary corpora in Europe – special attention was paid to compile and consolidate a manually curated extensive collection of literary corpora and their metadata in close cooperation with WP5. Next to becoming the initial dataset for the catalogue, this inventory has informed the development of the data model underlying the catalogue.

The work in WP7 focuses on the conceptualization and technical prototyping of a Programmable Corpus that can address the primary challenges of data homogenization and FAIR data principles-based integration of CLS research data into an emerging eco-system of CLS research. First, in exchange with the partners in CLS INFRA, work has been done on the development of a domain-specific ontology for transnational drama corpora, which is currently being tested under the name DraCorOn; work has also been done on various schemas (including an ODD file). On the other hand, tools were developed to support users (with and without CLS expertise) in homogenising corpora in the sense of the Programmable Corpora approach and thus making them findable and usable in the supported infrastructure.

WP 8 focuses its efforts to optimise the availability of fundamental NLP tools within a workflow for literary texts. Due to the complex nature of the task, the work carried out by this Work Package is still in the toolchain design stage.

WP9 is related to the management and oversight of the TNA selection process, and revolves around the recruitment and administration of the activities of the External Advisory Board. It has completed two calls of TNA research stays, with 14 successful applicants and 125 weeks awarded.
In WP3, progress beyond the state of the art has been primarily in better understanding of the current (best) practices in CLS research. This will be further developed and broadened to include specific showcases) until the end of the project. The expected impact concerns not only a better understanding of the practices and requirements of the CLS community within and beyond the project itself, but also a higher visibility of the added value of CLS research in the Humanities more generally.

As WP4 is concerned, potential impact of its activities lies primarily in the Training Schools. As outlined in the GA, they provide young researchers of different discipline background with a crash course in essential skills needed for textual analysis. Training schools are designed in a way to give them competencies that should improve their position on the labor market and potentially propel forward research and knowledge dissemination in their respective home countries.

In WP5, the preliminary results of the study of the narratives of terrorism will be presented in November 2022 at the congress of Association for Eurasian and Eastern European Studies in Chicago. Aiming to examine the tradition of Slovak novels, WP5 directed a cooperation proposal to a team based in the Slovak Academy of Sciences, preparing a collection of novels that will comply with the standards set by the ELTeC collection.

WP8 is making progress in developing, maintaining and testing the methods/tools/workflows that will be provided in this project (i.e. D8.2-5). This includes work on TEI, NER and sentimental analysis in various languages. WP8 has ongoing discussions on how to best make the materials and documentation of these tools available, as well as which standards to strive for in providing them. The work of WP8 thus pushes the state of the art as it seeks to develop multilingual toolchains for scholars working on historical literary materials. Typically, these workflows have only been tested on contemporary sources and thus every release of a new tool or its evaluation is a step in the positive direction for this field. All these tools will be available for free and open use.
CLS INFRA official logo