Research Knowledge Documentation, Analysis and Exploration in Empirical and Descriptive Sciences

Periodic Reporting for period 1 - ReKnow (Research Knowledge Documentation, Analysis and Exploration in Empirical and Descriptive Sciences)

Reporting period: 2020-06-01 to 2022-05-31

Researchers in the descriptive and empirical sciences, such as History, usually encounter problems related to the integration of data coming from multiple and diverse sources, the reuse of the data they produce in other contexts, the usability of the systems and tools they use for data management and exploration, and the verification and long-term validity of their scientific results. The ReKnow project aims to support researchers in better handling these problems by providing models, methodologies and tools that allow for better documentation, analysis and exploration of their research data and the knowledge they produce.

This is important for the wider humanities community, but also for the society, because it promotes open, transparent and reproducible research workflows in the age of Open Science.

Specifically, ReKnow has set three main objectives:
(a) Design and development of workflows and semantic data models for documenting, formally representing and interconnecting research data and processes, focusing on semantic interoperability and the production of sustainable data of high value and long-term validity;
(b) Development of methods, models and algorithms for supporting argumentation and reasoning processes, considering the provenance of data and the uncertainty of beliefs;
(c) Design and development of user-friendly and interactive tools for supporting different research processes, including data entry, documentation, exploration, and quantitative analysis.

To achieve its objectives, ReKnow has relied on two real use cases: (a) Maritime History, by exploiting data and domain knowledge from the SeaLit project (ERC, No. 714437), and (b) History of Art, by exploiting data and domain knowledge from the RICONTRANS project (ERC, No 818791). These collaborations have given to the Fellow the unique chance to work with researchers in these fields and learn how they work, what their questions are, and how they argue about their findings.

The project has fully achieved its objectives by providing models, systems and tools that have been widely used in practice by a large number of humanities researchers (mainly historians) for supporting and enhancing their data management activities.
Work was conducted via four work packages (WPs).

WP1 (Management) delivered the data management plan and two deliverables related to the project’s compliance with the ethics requirements. It was also responsible for the preparation of the final technical report.

WP2 (Research and Development) focused on the project’s research objectives, delivering a set of models, datasets and software systems. In particular: (1) a workflow model for holistic data management and semantic interoperability in quantitative archival research,(2) the SeaLiT Ontology, an extension of CIDOC-CRM for the modelling and integration of maritime history information, (3) the SeaLiT Knowledge Graphs, an RDF dataset of maritime history data, (4) a revision and extension of the CIDOC-CRM compatible models CRMinf (Argumentation Model) and CRMtex (Model of the Study of Ancient Texts), (5) the FastCat system (for the collaborative transcription and curation of archival data), (6) the FastCat Catalogues system (for exploring transcripts of archival documents), (7) the Synthesis RICONTRANS system (for documenting data in Art History research), (8) a special configuration of SeaLiT ResearchSpace (for exploring integrated maritime history data); (9) A-QuB-2 (for the user-friendly exploration of semantic data), (10) the LDAQ-CostEstimators library (for estimating the execution cost of link traversal based semantic queries).

In the context of WP3 (Training), the Fellow performed a number of activities related to disciplinary/interdisciplinary training and training on transferable/soft skills, including: (1) two personalized research projects (one in conceptual modeling, one in factual argumentation), (2) two secondments performed remotely (one at MPIWG, one at Metaphacts), (3) strong collaboration with researchers of the two ‘use case’ projects (SeaLiT, RICONTRANS), (4) delivery of two seminar talks, two invited talks, and six conference talks, (5) participation and networking in seven conferences, (6) participation in nine CIDOC-CRM SIG meetings, (7) supervision of two bachelor students, (8) leading a group of R&D engineers at the host institution, (9) project management/coordination. These activities significantly enhanced the Fellow’s career development and prospects, providing him with important skills (interdisciplinary thinking, leadership, supervision, networking, project management) and knowledge in new fields (digital humanities, conceptual modeling, argumentation).

Finally, WP4 (Communication, Dissemination, Exploitation) was concerned with outreach activities. Specifically: (1) the project outcomes have been disseminated in the relevant communities through scientific publications (5 journal papers, 2 conference papers, 1 book chapter, 2 preprints) and presentations (6 conference talks, 2 web seminar talks, 3 invited/meeting talks), (2) the project adopted an open-source / open-access approach for all its outcomes (publications, source code, datasets), (3) the project and its results have been communicated through the project’s webpage ( and social media (Twitter, LinkedIn), (4) the Fellow has already been in contact with third parties that are interested in exploiting the project outcomes.
The ReKnow project has pushed the frontiers of digital humanities forward in several ways. Most importantly:

1) The activities involved in archival/historical research are usually unconnected, in terms of data connection and flow, making difficult their recursive revision and execution, as well as the inspection of provenance information at data-element level. The introduced workflow model tackles these problems by approaching the data management part of archival research in a holistic manner, while being provenance-aware, highly-recursive, and with a focus on semantic interoperability, aiming at the production of high-quality and reusable data.

2) The information systems FastCat and Synthesis-RICONTRANS implement the proposed workflow model, supporting the collaborative and controlled data entry, documentation and curation, going beyond the current (problematic) practice which mostly uses spreadsheets or simple relational databases for data management.

3) The SeaLiT Ontology filled a gap that existed in the modeling and integration of maritime history data. By being compatible with the CIDOC-CRM standard, it facilitates data integration with relevant datasets that also make use of CIDOC-CRM. Likewise, the revisions of CRMtex and CRMinf, and their upcoming new version, offer to the relevant communities improved releases of these CIDOC-CRM compatible models.

An impact anticipated from the project is a drift to data management solutions for the humanities sciences that i) are more focused on the production of sustainable and interoperable data that can be reused beyond the objectives a particular research activity or project, and ii) are provenance-aware at micro (data element) level, which is important for reproducible research in the age of Open Science.
Part of the SeaLiT Ontology showing how information about a ship voyage is modelled in the ontology.
Workflow model for holistic data management and semantic interoperability in archival research.
Online presentation by the Fellow at the 20th International Semantic Web Conference (ISWC 2021).
