At the beginning of the project, we finalized and published a pilot study on the ATLAS collaboration at CERN. The study features, to the best of our knowledge, the first network analysis of the internal communication of a large and leading scientific collaboration. The study finds that the communication structures in place fit well the division of labor inside the collaboration and that the structure also features central persons which are likely responsible for pulling partial results together. Specific network measures also indicate that the communication network in ATLAS may be special in the sense that it is unusually non-hierarchical for a clustered (social) network.
The initial phase of the project was also essential to develop the details of the data protection plan and associated ethical considerations. Lacking precedents, we had to come up with a novel kind of procedure which would fit the case at hand, in particular taking into account the large an fluctuating membership of the ATLAS collaboration and respecting the European Union's General Data Protection Regulation.
In addition, a research team and the necessary infrastructure had to be established for the purposes of conducting the project as it would not be hosted by an already existing department. We currently are three scientific team members (including the PI) with a rare but fitting combination of expertise in Digital and Computational Humanities as well as History, Philosophy and Sociology of Science and Physics in the newly set-up temporary department for History and Philosophy of Modern Science at Technische Universität Berlin.
Given the staggering advances in the development of Large Language Models (LLMs) since around the time where the project started, our work focused mostly on the development of digital tools which would allow the analysis of the content, or semantics, of the scientific discourse. "Semantic modelling" had been identified in the project proposal as a pressing issue with the potential to lead to the most innovative results. Taking into account the specialties of scientific language, in particular in high energy physics, we domain-adapted large language models using corpora from arxiv.org. In this work we could also profit from the experience of some of us in a DFG-funded project where we adapted a LLM to a corpus of journals from the Physical Review family thanks to an agreement with the American Physical Society.
As we recognized the general potential of LLMs for the History, Philosophy and Sociology of Science (HPSS), we organized a workshop on this topic which took place in April 2025. A collected volume building on the contributions to the workshop is currently under preparation and planned for publication in early 2026. An AI-generated report of the conference contributions is also in the making. In relation to the use of LLMs for HPSS, we also published or submitted several pioneering papers - three on how to use LLMs for digital conceptual history of science and one a more general survey and assessment of LLMs' potential for HPSS. For some of the papers, we used the case of the virtual particle, familiar to us from the previous DFG-funded project, as well as the word "Planck" as intriguing and suitable test cases to further develop and refine our methods. One of the papers has been nominated for the best paper award at the Computational Humanities Research Conference 2024.
We have also been invited to present our pilot study or our work in progress in international workshops and research seminars, among other places at the Center for Philosophy of Science of the University of Pittsburgh as a Featured Former Fellow.