Network epistemology in practice

Project Information

NEPI

Grant agreement ID: 101044932

DOI

10.3030/101044932

EC signature date 23 June 2022

Start date 1 January 2023

End date 31 December 2027

Funded under

European Research Council (ERC)

Total cost

€ 1 992 181,25

EU contribution

€ 1 992 181,00

1 992 181,00

0,25

Coordinated by

TECHNISCHE UNIVERSITAT BERLIN
Germany

Periodic Reporting for period 1 - NEPI (Network epistemology in practice)

Reporting period: 2023-01-01 to 2025-06-30

The present project will contribute to research in the history and philosophy of science through extensive use of state-of-the-art tools from the digital, or rather computational, humanities. Recent advances in artificial intelligence, resulting in the breathtaking development of Large Language Models, will help analyze not only structural features of scientific collaboration but also the content of scientific discourse. The project will focus on an investigation of recent research practice in particle physics at the European Center for Nuclear Research (CERN) in order to gain a better understanding of how knowledge is generated and validated in very large scientific collaborations. The main working hypothesis of the project is that collective research processes can be characterized, in epistemologically relevant terms, through a bird’s eye view analysis of the collaboration’s internal communication. The internal communication will be reconstructed from born-digital documents (e-mails, internal wiki pages, etc.) which accrue in the research practice of the collaboration. Abstracting from the case study, the project will also develop historiographic guidelines that can be transferred to future epistemological studies of modern scientific collaborations. These guidelines include recommendations for how to study a scientific collaboration during its daily research practice while respecting the privacy and interests of the collaboration members and of the collaboration as a whole. Last but not least, the project will contribute to the philosophy of collective knowledge generation, in particular to recent issues in “network epistemology”, by adapting the theoretical models to better fit important real-world cases.

Until recently, it was nearly impossible to capture large-scale and complex research processes, such as the ones at CERN, and make them accessible for epistemological analysis. Almost all the studies of the research practice at CERN or similar cases have so far been restricted to the analysis of published articles, selected interviews and participant observation. Accompanied by historiographic guidelines and practical strategies (both of which are lacking at the moment) for best practices in the history and philosophy of science based on born-digital sources, the application of digital tools and computational methods may finally help us attain a maximally comprehensive picture of recent research practice in particle physics and beyond.

At the beginning of the project, we finalized and published a pilot study on the ATLAS collaboration at CERN. The study features, to the best of our knowledge, the first network analysis of the internal communication of a large and leading scientific collaboration. The study finds that the communication structures in place fit well the division of labor inside the collaboration and that the structure also features central persons which are likely responsible for pulling partial results together. Specific network measures also indicate that the communication network in ATLAS may be special in the sense that it is unusually non-hierarchical for a clustered (social) network.
The initial phase of the project was also essential to develop the details of the data protection plan and associated ethical considerations. Lacking precedents, we had to come up with a novel kind of procedure which would fit the case at hand, in particular taking into account the large an fluctuating membership of the ATLAS collaboration and respecting the European Union's General Data Protection Regulation.
In addition, a research team and the necessary infrastructure had to be established for the purposes of conducting the project as it would not be hosted by an already existing department. We currently are three scientific team members (including the PI) with a rare but fitting combination of expertise in Digital and Computational Humanities as well as History, Philosophy and Sociology of Science and Physics in the newly set-up temporary department for History and Philosophy of Modern Science at Technische Universität Berlin.
Given the staggering advances in the development of Large Language Models (LLMs) since around the time where the project started, our work focused mostly on the development of digital tools which would allow the analysis of the content, or semantics, of the scientific discourse. "Semantic modelling" had been identified in the project proposal as a pressing issue with the potential to lead to the most innovative results. Taking into account the specialties of scientific language, in particular in high energy physics, we domain-adapted large language models using corpora from arxiv.org. In this work we could also profit from the experience of some of us in a DFG-funded project where we adapted a LLM to a corpus of journals from the Physical Review family thanks to an agreement with the American Physical Society.
As we recognized the general potential of LLMs for the History, Philosophy and Sociology of Science (HPSS), we organized a workshop on this topic which took place in April 2025. A collected volume building on the contributions to the workshop is currently under preparation and planned for publication in early 2026. An AI-generated report of the conference contributions is also in the making. In relation to the use of LLMs for HPSS, we also published or submitted several pioneering papers - three on how to use LLMs for digital conceptual history of science and one a more general survey and assessment of LLMs' potential for HPSS. For some of the papers, we used the case of the virtual particle, familiar to us from the previous DFG-funded project, as well as the word "Planck" as intriguing and suitable test cases to further develop and refine our methods. One of the papers has been nominated for the best paper award at the Computational Humanities Research Conference 2024.
We have also been invited to present our pilot study or our work in progress in international workshops and research seminars, among other places at the Center for Philosophy of Science of the University of Pittsburgh as a Featured Former Fellow.

To the best of our knowledge, our study is the first to analyze the internal scientific communication of a major large scale collaboration. The internal network structure of what has aptly been called a "collective author" has never been identified before our pilot study. We will be able to pursue this analysis further and hope to uncover also how this network evolves in time and how relevant scientific information flows through it. To do this we have developed a detailed and novel kind of data protection plan including privacy preserving measures, which as far as we can tell, have also no precedents. Also our uses of Large Language Models (LLMs) are at the forefront of the research concerned with how to use these novel techniques for historical, philosophical and sociological investigations of how modern science works. We are strongly associated with two of the most prominent large science collaborations, viz. the ATLAS collaboration at CERN and the Next Generation Event Horizon Telescope Collaboration (ngEHT). This close relationship promises to help us pursue our study of the inner workings of such collaborations in a unique way.

The internal communication network of a major scientific collaboration

Periodic Reporting for period 1 - NEPI (Network epistemology in practice)

Download Download the content of the page