By building on one of the largest and most significant digital collections of cultural heritage in Europe, the core NewsEye objective was to deliver innovative tools and services to significantly improve the way historical newspapers can be accessed, explored and analyzed, intending widespread use and large impact. The project created a valuable, inexpensive, and immediately useful NewsEye toolbox and demonstrator platform for assisting users of all types, available as open science through the project's Github repository (
https://github.com/NewsEye/(se abrirá en una nueva ventana)) while public datasets and models were made available through Zenodo. The developed workflow is composed of four main layers, each providing advanced techniques and tools for:
- Text Recognition and Article Separation, extracting the layout of newspapers (e.g. articles and graphical regions) from digitized newspapers and transforming the content to textual format, providing full articles through automatic layout analysis, text recognition and article separation.
- Semantic Text Enrichment, enhancing the utility of the newspaper collections by enriching the texts with higher-level semantic annotation using named-entity recognition. Extracted named entities were linked to external references (such as the Wikipedia) across languages, with the goal to support multilingual analysis. This layer also ensured event detection, as support for pattern discovery from textual contents.
- Dynamic Text Analysis, providing tools to exploit the enriched data for more elaborated analysis of user-selected newspaper content, supporting interactive queries to discover different viewpoints, sub-topics or trends concerning the selected topic, named entity, newspaper, timeframe or other category, so as to provide insights into the newspaper collection in contextualized and comparative manners.
- Intelligent analysis and reporting (“Personalized Research Assistant”), providing an alternative, “intelligent” interface to the other tools and the data, carrying out iterative cycles of analysis and reporting to the user in natural language. The user became able to authorize the Personal Research Assistant to investigate a given topic (or time window or newspaper etc.) on the user’s behalf, with the Assistant reporting back on findings which it assesses as potentially interesting for the user, reported in natural language and in a transparent manner so the findings can be understood and verified by the user. Given the European context, we were be able not only to analyze newspapers written in multiple languages but also to report on the findings in multiple languages; to this end, the Assistant used multilingual natural language generation (NLG) to produce textual descriptions of the results obtained by the Investigator.
The NewsEye consortium further involved experts whose role was to ensure (i) additional technical expertise in the above-mentioned aspects, (ii) access to and enrichment of digitized newspapers, (iii) insight and experience in using historical newspapers as a rich cultural heritage resource for the understanding of developments in society, economy and politics, (iv) use cases with the aim to address important humanities’ research desiderata and gain experience and feedback to guide iterative development of the NewsEye demonstrator, and (v) strong dissemination and viable paths towards wider adoption and sustainability of the developed tools.
All the results and outputs of the project are available on the project website, notably with data sets, publications and source code inventoried under its "Open Science" tab.