Periodic Reporting for period 1 - NEPI (Network epistemology in practice)
Reporting period: 2023-01-01 to 2025-06-30
Until recently, it was nearly impossible to capture large-scale and complex research processes, such as the ones at CERN, and make them accessible for epistemological analysis. Almost all the studies of the research practice at CERN or similar cases have so far been restricted to the analysis of published articles, selected interviews and participant observation. Accompanied by historiographic guidelines and practical strategies (both of which are lacking at the moment) for best practices in the history and philosophy of science based on born-digital sources, the application of digital tools and computational methods may finally help us attain a maximally comprehensive picture of recent research practice in particle physics and beyond.
The initial phase of the project was also essential to develop the details of the data protection plan and associated ethical considerations. Lacking precedents, we had to come up with a novel kind of procedure which would fit the case at hand, in particular taking into account the large an fluctuating membership of the ATLAS collaboration and respecting the European Union's General Data Protection Regulation.
In addition, a research team and the necessary infrastructure had to be established for the purposes of conducting the project as it would not be hosted by an already existing department. We currently are three scientific team members (including the PI) with a rare but fitting combination of expertise in Digital and Computational Humanities as well as History, Philosophy and Sociology of Science and Physics in the newly set-up temporary department for History and Philosophy of Modern Science at Technische Universität Berlin.
Given the staggering advances in the development of Large Language Models (LLMs) since around the time where the project started, our work focused mostly on the development of digital tools which would allow the analysis of the content, or semantics, of the scientific discourse. "Semantic modelling" had been identified in the project proposal as a pressing issue with the potential to lead to the most innovative results. Taking into account the specialties of scientific language, in particular in high energy physics, we domain-adapted large language models using corpora from arxiv.org. In this work we could also profit from the experience of some of us in a DFG-funded project where we adapted a LLM to a corpus of journals from the Physical Review family thanks to an agreement with the American Physical Society.
As we recognized the general potential of LLMs for the History, Philosophy and Sociology of Science (HPSS), we organized a workshop on this topic which took place in April 2025. A collected volume building on the contributions to the workshop is currently under preparation and planned for publication in early 2026. An AI-generated report of the conference contributions is also in the making. In relation to the use of LLMs for HPSS, we also published or submitted several pioneering papers - three on how to use LLMs for digital conceptual history of science and one a more general survey and assessment of LLMs' potential for HPSS. For some of the papers, we used the case of the virtual particle, familiar to us from the previous DFG-funded project, as well as the word "Planck" as intriguing and suitable test cases to further develop and refine our methods. One of the papers has been nominated for the best paper award at the Computational Humanities Research Conference 2024.
We have also been invited to present our pilot study or our work in progress in international workshops and research seminars, among other places at the Center for Philosophy of Science of the University of Pittsburgh as a Featured Former Fellow.