During the first year of the project, MDDB has made substantial progress. Our focus in data management has been on identifying trajectory storage formats and metadata topology requirements, informed by input gathered through workshops and meetings with scientists and developers representing various institutions and tools within the MD simulation community. These discussions have led to defining requirements and desirable properties for different categories of data, including raw trajectory data compression, metadata ontologies, provenance records, and strategies for data multiplexing and retrieval. We have taken the lead on establishing community-wide standards for biomolecular data compression, aiming for greater efficiency than existing methods. Our efforts in ontology development focused on unifying simulation metadata, parameters, and molecule/force field specifications into a single format, allowing for optional inclusion based on necessity. Simultaneously, we have started developing key-value pairs for simulation parameters, making fields hierarchical corresponding to algorithms, while coordinating nomenclature in alignment with community consensus.
Regarding the technical infrastructure, we have outlined the initial technical framework that will support MDDB operations, drawing on insights from existing MD database projects (pilot use cases). We investigated specific technical requirements for particular systems and methods included in the datasets and identified requirements and possible issues, to make decisions regarding the architecture and design and development of the technical infrastructure prototype. We are using new datasets (pilot cases) to test the available software stack with new data in a federated layout. Two MDDB nodes have been set up at IRB and BSC, demonstrating that our concept of having a federated database is achievable, although still small.
Efforts towards assessing the technical feasibility have been complemented with the definition of a dissemination plan targeting end-users and other key stakeholders whose inputs and support will be key in the development and future sustainability of the MDDB infrastructure. Events and other means of engaging with MDDB will be intensified during the second year of the project. All information will be published on the project website (www.mddbr.eu).