Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

CApturing Paradata for documenTing data creation and Use for the REsearch of the future

Periodic Reporting for period 4 - CAPTURE (CApturing Paradata for documenTing data creation and Use for the REsearch of the future)

Reporting period: 2023-11-01 to 2024-12-31

Sharing and reusing research data is essential for advancing science and scholarship globally. However, a significant challenge lies in the lack of context about how data was generated, processed, and utilised. Without this context, leveraging data for scientific progress, academic research, and societal benefits becomes difficult. This issue is particularly critical in fields of research and practice where data varies widely in nature and origin. This issue is particularly critical in fields of research and practice where data can be very diverse in nature (e.g. qualitative, quantitative, naturalistic, or specifically created) and origin (e.g. historical or contemporary, from various contexts and locations).The core problem is that while there might be plenty of metadata (information about the data), there is often a shortage of paradata (details about the processes of data creation and use). Paradata helps us understand the story behind data: how it was created, processed, and used. Without it, making informed decisions can be tricky.

As the first comprehensive research project in the topic, the CAPTURE project has investigated the complex problem of understanding what information about the creation and use of research data is necessary and how to capture enough of it to ensure the data remains useful in the future. Focusing on archaeology, a highly cross-disciplinary field, CAPTURE has explored the diversity of paradata and its practices, extending its inquiry across research disciplines and practical contexts.

CAPTURE addressed how to capture and document the processes underpinning research data creation and use, making it reusable for future researchers and stakeholders. The project engaged with the practical need to document intellectual processes effectively and efficiently, ensuring that relevant paradata is captured and preserved.

The results of CAPTURE show that capturing information on research data creation and use is a process that needs to run parallel with the making and use of research data throughout its lifetime. Different user communities have varying needs for paradata, requiring different types of information for different contexts and purposes. Adequate documentation is an ongoing effort, involving planning, describing data creation and use, documenting key information, and preserving working documents and preliminary data versions. This means that rather than being a particular type of data different from all others, paradata consists of all kinds of information that someone can use as paradata to learn about research data creation and use to make the data understandable and reusable.

CAPTURE's results contribute significantly to maximising the impact of research infrastructures by addressing the issue of "dark data" (data that is difficult to find and access). The project provides crucial insights for implementing open data policies in Europe and globally, promoting effective data sharing and reuse in disciplinary and cross-disciplinary knowledge ecosystems.

Without proper documentation of the processes involved in data creation, understanding, and interpretation, there is a risk of creating large collections of data that cannot support research. Worse, researchers might draw false conclusions from data created under incompatible premises. CAPTURE's work helps mitigate these risks, ensuring that research data remains useful and reliable for future use.
The CAPTURE project has developed a thorough empirical, theoretical, and conceptual understanding of paradata—details about the processes of data creation and use. This understanding has been achieved through interviews, surveys, participatory observation, and analysis of various research-related artefacts and documentation, including reports, literature, instruction manuals, and datasets. The project has identified and categorised different types of paradata, including methods, knowledge organisation, and provenance paradata. It has also developed methods for creating and eliciting this information, including prospective, in situ, and retrospective approaches, and explored the role of paradata in data literacy.

CAPTURE has produced nearly 60 scholarly publications, with more expected soon. The project has organised a series of events, including international workshops, a hybrid conference, and recorded webinars, with established and raising key experts in the field. It has engaged with a wide range of scholarly and practitioner communities through publications, talks, and workshops. Additionally, CAPTURE has produced two comprehensive books on paradata. The first, "Perspectives on Paradata: Research and Practice of Documenting Data Processes" (Springer 2024), explores the paradata phenomenon across various fields, from computational biology to heritage and legal studies. The second book, "Paradata: Documenting Data Creation, Curation and Use" (Cambridge University Press, 2025), offers the first comprehensive account of paradata in theory and practice.

Beyond its empirical contributions, CAPTURE has advanced the theoretical and conceptual understanding of paradata by developing a reference model that presents the first comprehensive theory of paradata. This model provides guidance on capturing paradata to support data-intensive research involving diverse and heterogeneous data.

A key outcome of the CAPTURE project has been the establishment of a foundation for interdisciplinary dialogue and raising awareness about the importance of documenting data creation, processing, and use in our increasingly data-driven world. Understanding the context, origins, processes, and epistemic practices related to data is crucial for informed reuse and avoiding research based on erroneous premises. CAPTURE has played a pivotal role in this dialogue through its interdisciplinary CAPTURE Talks webinar series, the "Perspectives on Paradata" book project, workshops, seminars, and direct engagement with a broad range of research and practitioner communities. The project's results have already attracted widespread interest, highlighting its significant impact on the field.
Many of the individual studies within CAPTURE advance significantly beyond the state-of-the-art. They lay the foundations for understanding what different kinds of artefacts can reveal about processes and practices, how to identify and extract paradata in datasets, and address the major gap in existing paradata documentation—how knowledge is organised and represented, rather than just how observations were made.

The results provide evidence of how and why a one-size-fits-all approach does not work for data documentation and demonstrate why it makes sense to diversify approaches to data and paradata documentation based on how data-centric a research field is. These insights have already attracted significant interest in the scholarly and scientific community and among research data management professionals alike.

The same applies to theoretical work that is groundbreaking in the context of paradata and documentation of data practices and processes. The practical problem of the need of understanding and documenting data creation and use is apparent across diverse fields of science and scholarship but until now before CAPTURE, there has not been robust enough evidence-based concepts and theory to properly address them.
Documenting data creation and use is difficult - but the implications of doing so are significant.
My booklet 0 0