In a digital world, everything comes down to data – and poetry is no exception. “By making poetry available online as machine-readable data, we open the door to new possibilities for linking, indexing, and extracting new information,” says Elena González-Blanco, research director at IE University and director and founder of LINHD, the Digital Humanities Innovation Lab at Spain’s Universidad Nacional De Educación a Distancia. With the support of the EU-funded POSTDATA project, González-Blanco is working to close the digital gap between poetry and technology. “By combining cutting-edge philological and computational research, we are constructing a virtual world of semantically linked poetry,” she explains. “In doing so, we aim to transform traditional scholarship on poetry into a digital humanities research environment.”
An innovative ontological model
With a focus on poetry analysis, classification, and publication, this European Research Council supported project built an innovative ontological model to study the interoperability of different poetry collections. The model is fully aligned with FRBRoo, a formal ontology: “Intended to capture and represent the underlying semantics of bibliographic information and to facilitate the integration, mediation, and interchange of bibliographic and museum information,” she explains. “We’re using semantic web technologies to link and publish literary datasets in a structured way and linking this to the data cloud,” says González-Blanco. But POSTDATA goes one step further and applies artificial intelligence to poetry. “To further help scholars analyse Spanish poetry in an automated way, we are also building a number of tools using natural language processing.”
Digitalising poetry offers a number of advantages
So why digitalise poetry? According to González-Blanco, making poetry available online with machine-readable linked data offers a number of advantages. “First and foremost, the academic community now has an accessible digital platform to work with poetic corpora and to contribute to its enrichment with their own texts,” she notes. “This same resource will also be available for use in, for example, education, cultural diffusion, or even entertainment.” González-Blanco goes on to say that, thanks to the use of standard technologies and open-source software, this method of encoding and standardising poetic information also guarantees its preservation. “A lot of poetry is only found in old books or even just transmitted orally,” she adds. “By digitising and storing the text as XML files, we help ensure its place in our cultural record.”
A big step towards a complete poetry repository
According to González-Blanco, the POSTDATA project represents a big step towards building a comprehensive, accessible, and interoperable poetry repository. “Not only have we built the ontology, we also created state-of-the-art tools that use artificial intelligence and natural language processing for automatic poetry analysis,” she concludes. “The results are simply amazing.” A work-in-progress, the project is now working to build a website where it can make its tools and findings publicly available. Researchers are also developing new computational paradigms to further analyse the poetic domain, including the analysis of song lyrics.
POSTDATA, digital gap, poetry, technology, digital technologies, artificial intelligence, machine learning, natural language processing, data, computational research, ontological