Community Research and Development Information Service - CORDIS


POSTDATA Report Summary

Project ID: 679528
Funded under: H2020-EU.1.1.

Periodic Reporting for period 2 - POSTDATA (Poetry Standardization and Linked Open Data)

Reporting period: 2017-09-01 to 2018-01-31

Summary of the context and overall objectives of the project

POSTDATA (Poetry Standardization and Linked Open Data) aims at bridging the digital gap among traditional cultural assets and the growing world of data. It is focused on poetry analysis, classification and publication, applying Digital Humanities methods. The goal is to look for standardization, as well as innovation by using semantic web technologies to link and publish literary datasets in a structured way in the linked data cloud.
The advantages of making poetry available online as machine-readable linked data are threefold: first, the academic community will have an accessible digital platform to work with poetic corpora and to contribute to its enrichment with their own texts; second, this way of encoding and standardizing poetic information will be a guarantee of preservation for poems published only in manuscripts or even transmitted orally, as texts will be digitized and stored; third: datasets and corpora will be available in open access, thus the data could be used by the community for other purposes, such as education, cultural dissemination or entertainment.

On one hand, interoperability problems between the different poetry collections will be solved by using semantic web technologies to link and publish literary datasets in a structured way in the linked data cloud. For this purpose, a metadata application profile (MAP), a semantic model in the Linked Open Data (LOD) ecosystem, will be built. The MAP will allow the communication of existing data that couldn’t be shared before. For instance, the project will provide a large amount of information (open science) to expand the frontiers of knowledge and research.
On the other hand, automatization problems will be solved by the creation of a Poetry Lab. Thanks to this lab, researchers would be able to implement the most up-to-date language technologies and computational methods to process poetry data. Since no set of tools to address basic poetry issues existed before, the Poetry Lab will make life easier for researchers and users by democratizing technology and user experience.

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

Activities performed during the second period of the project have been mainly focused on the building of a metadata application profile (MAP) as a common semantic model to be used by the European Poetry community and in creating tools for the automatic annotation of poetic features, largely based on Natural Language Processing research.


Regarding the building of a metadata application profile (MAP), the project focused on finishing the domain model (DM) of the aforementioned MAP. To that end, the final revision of the concepts of the domain model was carried out through the following tasks:
1. Analysis of the model of sixteen poetic repertoires.
2. Analysis of a survey addressed to the final users of poetic resources in order to understand the data needs of the users of poetry databases.
3. Analysis of the graphical user interface on the Web of Documents of five repertoires to retrieve the informational needs of specific poetic repertoires.
4. Analysis of six poems from different traditions to create use cases applying the data model.
5. Identification of the properties of the data model that need to be defined with a controlled vocabulary.
6. Query of multiple databases looking for LOD vocabularies that could contain vocabulary terms that could be incorporated in the MAP.

In total, twenty-three repertoires were analysed, covering sixteen different languages.

In addition, the team has begun to prepare the validation process of the DM. This process consists in providing the means for any expert not familiar with POSTDATA’s DM to analyse a poetic resource in a manner compatible with our data model.


Regarding the creation of tools for the automatic annotation of poetic features, the project carried out Natural Language Processing (NLP) research applying it to tool development and also worked on the creation of corpora.
Tools developed by the project:
- HisMeTag (Hispanic Medieval Tagger). This is a Named Entity Recognition for Medieval Spanish tool that covers a basic content analysis process for a language variety (Medieval Spanish) for which such tools were until now inexistent. A manually annotated reference corpus was created to be able to evaluate the automatic tagging.
- ANJA (Automatic eNJambment detection).: A pipeline for enjambment detection in Spanish had been developed at an earlier quarter in the project. A user interface was added to this pipeline.
- SKAS (Scansion in Spanish): Selection of a reference data set for contemporary poetry and Medieval poetry. This corpus is necessary to be able to test the scansion module that will be further developed.

In terms of impact, the project succeeded also in creating and reinforces strategic synergies with related project and initiatives as well as with the research community. At the end of this period, POSTDATA has had considerable synergies with a number of initiatives, such as DARIAH-EU, CLARIN, DESIR and EUROPEANA, among others.

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

As it was mentioned, POSTDATA will be materialized in the creation of a digital semantic web-based platform for poetry analysis and edition, to study, publish and share digital collections in a virtual research environment using digital humanities open standards combined with traditional philological academic analysis. The environment will be open to any language and type of poetry and accessible for multiple users with different profiles, and it will provide access to digital resources on poetry linked together through data repositories. POSTDATA is based in three pillars.

1. Semantic Modelling and Linked Open Data (LOD). The effort of gathering data with an encyclopaedic spirit was the origin of poetical repertoires. Interoperability among poetic repertoires is not simple, as there are not only technical issues involved, but also conceptual and terminological problems: each repertoire belongs to its own poetical tradition and each tradition has developed an idiosyncratic analytical terminology in a different and independent way for years. As no previous model of such a poetic conceptualization existed before, the common data model created will be one of the main and most innovative contributions of the project. Regarding the progress beyond the state of the art in this aspect, more poetic resources than the ones initially stated in the project proposal were found and analysed. This has made more difficult the modelling process, but as a result, POSTDATA´s model become more comprehensive and, especially, more rigorous.

2. Poetry Lab: it will include different levels of poetry scholarship, from the most formal processes to the most cognitive and subjective ones involving Artificial Intelligence techniques. In terms of Natural Language Processing, on one hand, the project worked on geolocation (with the support of the Pelagios project) implementing a tool valid for Spanish Medieval texts which overcomes the current state of the art. On the other hand, the project has made great progress in the automatic detection of enjambment, a difficult prosodic phenomenon to analyse.

3. Research Infrastructures: Social impact and user perception. The project will propose a single platform devoted to poetry analysis, edition, visualization and publication, user-friendly and based on a linked open data system. Third pillar focused on the creation of a digital platform for poetry edition oriented to different kind of users: scholars with academic purposes who want to work on critical digital editions, nonexperienced uses that want to read, share and learn more about poetic traditions and also companies who will use this resource for different application in fields like education, psychology, tourism or cultural purposes. It will have the interoperable capacity that allows us “recycling” and integrating previously existing tools that have been developed by other research teams at previous projects. Innovation lies in the application context for this combination of tools, which specifically oriented to poetry analysis.

Related information

Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top