Periodic Reporting for period 1 - FlexiMOFs-2 (A Design Principle for Predicting Flexible Metal-Organic Frameworks)
Berichtszeitraum: 2023-09-01 bis 2025-08-31
The FlexiMOFs-2 project was conceived to address this challenge by developing predictive, data-driven models that can link the structure and flexibility of MOFs to their synthetic feasibility. The overarching goal was to establish a framework that could accelerate the discovery of new MOFs and provide viable synthesis recommendations based on structure-property-synthesis correlations. Within the broader EU policy framework for digital and sustainable materials innovation, the project aligns closely with the European Green Deal, FAIR data principles, and Horizon Europe’s digital transformation agenda, which promotes transparent, reproducible, and interoperable data infrastructures for chemistry and materials science.
During the implementation phase, the project evolved into building a foundational predictive platform, FAIR-MOFs, which systematically bridges the gap between experimental synthesis knowledge and computational MOF data. The resulting database integrates over 45,700 curated crystal structures, 33,361 geometry-optimised structures, and 4,161 synthesis-linked entries, establishing one of the largest FAIR datasets for MOF synthesis. Using this resource, a graph-based neural network model was developed to predict the most probable metal salts, ligands, and solvents directly from 3D MOF structures, thus providing the first structure-informed recommender for synthesis of MOFs.
By demonstrating the experimental synthesis of a predicted MOF using model-derived conditions, the project has laid the groundwork for a new paradigm of AI-guided reticular synthesis. Although the initial scope included modelling MOF flexibility the research emphasis naturally converged toward predictive synthesis modelling, which represents a critical precursor to understanding flexibility and adaptive behaviour.
The project’s pathway to impact rests on enabling a data-driven synthesis ecosystem that can (i) accelerate the realisation of hypothetical MOFs into tangible materials, (ii) enhance reproducibility and efficiency of MOF synthesis across laboratories, and (iii) support the EU’s vision of open, FAIR, and sustainable materials research. By providing both the database and predictive infrastructure openly to the research community, the outcomes of FlexiMOFs contribute directly to reducing experimental waste, advancing digital chemistry, and fostering interdisciplinary collaboration between computational scientists, synthetic chemists, and data engineers.
1. Development of FAIR-MOFs Database and Predictive Modelling Framework
The project began with large-scale data curation and FAIR standardization of 120,000 MOFs extracted from the Cambridge Structural Database (CSD), which led to the compilation of dataset consisting of 45,000 curated crystal structures of MOFs. A geometry optimisation was successfully performed on approximately 33,000 MOFs to obtain relaxed structures, which can be used for further computational studies. In parallel advanced text-mining pipelines were applied to extract and link experimental synthesis parameters (metal salts, organic, solvents, concentration, time and reaction temperatures) to the corresponding structures thus generating more than 4,000 structure–synthesis pairs.
These data were compiled into a FAIR and interoperable digital infrastructure, which served as the foundation for predictive synthesis modelling. Using this resource, a graph-based neural network (GNN) was trained to predict synthetic conditions directly from 3D atomic structures.
The generalisability of the trained model was assessed by applying it to randomly selected hypothetical MOFs from multiple publicly available databases. The predicted synthesis parameters were subsequently validated experimentally consequently resulting in the successful laboratory synthesis of several frameworks, including new structures not previously reported. This demonstrated the model’s capacity to bridge computational design and experimental realisation thus marking a significant advance toward autonomous AI-guided synthesis of MOFs.
2. Algorithmic and Methodological Advances
The project led to the development of open-source Python libraries, mofstructure and fairsyncondition, to support data extraction, curation, and predictive modelling.
2-1. mofstructure
This library provides a robust and scalable framework for the deconstruction and structural analysis of MOFs and related porous materials. Its key functionalities include:
a. Guest molecule removal using graph-based periodicity detection, enabling accurate treatment of porous systems.
b. Deconstruction of MOFs into unique secondary building units (SBUs), organic linkers, and metal clusters, with full atom mappings and cheminformatic identifiers (SMILES, InChI, InChIKey).
c. Topological classification and pore-size analysis, integrating a Python wrapper of Zeo++ to compute geometric descriptors such as pore-limiting diameter, accessible surface area, and void fraction.
d. Descriptor extraction, including coordination number, number of open metal sites, and topology-dependent features.
2-2 fairsyncondition: Graph Neural Network for Predictive Synthesis Modelling
The fairsyncondition library constitutes the machine learning core of the project, which includes the implementation of the graph-based neural network (GNN) designed to predict synthesis conditions directly from the 3D atomic structures. This system provides the first end-to-end framework for translating structural information into experimental synthesis recommendations. The structure of each MOF was represented as an undirected periodic graph, where nodes correspond to atoms and edges represent coordination bonds. Node features included atomic number, while edge features captured interatomic distances. Global features such as space group, crystal system, and topological density were appended to preserve lattice-level context. Training data were derived from the curated structure–synthesis pairs obtained through the text-mining workflow. After normalisation, approximately 4,161 MOFs were used for supervised learning. The model was trained using a categorical cross-entropy loss, optimised with the Adam optimiser (learning rate: 1×10-3), and implemented in PyTorch Geometric. To mitigate data imbalance among reagents, weighted sampling and label-smoothing regularisation were applied. Extensive five-fold cross-validation was used to ensure generalisability and early stopping was employed to prevent overfitting. The trained model achieved Top-3 and Top-5 accuracies of 68.5% and 78%, respectively, in predicting the correct combination of metal precursors, ligands, and solvents. These metrics substantially outperformed random and baseline composition-based models, confirming that structural features alone can encode synthesis-relevant chemical information.
Beyond prediction, fairsyncondition library includes utility modules for:
a. Interconversion between 3D structures and chemical identifiers, generating IUPAC names, SMILES, and InChIKeys for predicted ligands and solvents;
b. Data serialisation into FAIR-compliant JSON formats, ensuring reproducibility and compatibility with the FAIR-MOFs database;
2-3 Black Hole Strategy
To further optimise model training and computational efficiency, a novel gravity-based representative sampling approach, termed the Black Hole Strategy, was introduced. This technique corresponds to a data sparsification method for identifying the most representative subset within a larger dataset, demonstrating that models can be accurately trained on a small subset of data while reducing computational cost by up to 70%.
3. Text Mining Strategies for Linking Synthesis Conditions to Structures
A major challenge in the project was the extraction and precise mapping of synthesis conditions from the scientific literature to their corresponding crystal structures in the Cambridge Structural Database (CSD). To overcome this, a multi-stage natural language processing (NLP) and machine learning pipeline was developed to enable automated retrieval, interpretation, and normalisation of synthesis data.
The workflow began by identifying 47,521 unique Digital Object Identifiers (DOIs) associated with deposited MOFs in the CSD. Each publication was automatically retrieved in HTML format and segmented into individual paragraphs. A binary text classifier, trained on manually annotated synthesis descriptions, was then applied to identify paragraphs likely to contain experimental procedures.
Within these paragraphs, a combination of regular expressions, sentence-transformer embeddings, and named entity recognition (NER) models were employed to metal salts, organic ligands, solvents, concentration, reaction time, temperature, and synthetic method. To ensure chemical consistency, all extracted reagents were converted to canonical chemical identifiers (InChIKeys for ligands and solvents, and formula-based normalisation for metal salts).
The extracted entities were then cross-referenced with CSD structures using metadata such as refcodes, publication DOIs, and crystallographic composition. This step ensured that each synthesis record could be uniquely mapped to a specific 3D structure, closing the long-standing gap between textual synthesis information and structural data.
To improve data quality and interoperability, multiple post-processing and standardisation layers were implemented:
a. Duplicate entries and inconsistent reagent spellings were harmonised through chemical fingerprint similarity.
b. Reaction conditions were stored as structured JSON objects, compatible with machine learning workflows and FAIR data principles.
d. Statistical validation confirmed that the resulting dataset included 1,743 unique ligands, 793 unique metal salts, and 78 unique solvents, mapped across 4,161 MOFs.
This comprehensive structure-synthesis mapping represents the first large-scale effort to directly connect 3D MOF structures with their experimental preparation routes. It provided the foundational data upon which fairsyncondition was trained to enable predictive modelling of synthesis conditions.
4. Digital Tools, Open Access Resources, and Integration
Two interoperable digital tools were further implemented to facilitate public access and reuse:
a. FAIR-MOFs database (https://nomad-lab.eu/prod/v1/gui/search/mofs(öffnet in neuem Fenster))
b. cheminteraction (https://www.cheminteraction.com(öffnet in neuem Fenster))
5. Main Scientific Achievements
a. Establishment of a FAIR-compliant database linking MOF structures and synthesis data.
b. Development of the mofstructure toolkit for automated feature extraction.
c. Implementation of a graph neural network that predicts synthesis conditions directly from crystal structures.
d. Experimental validation of predicted synthesis conditions through successful MOF synthesis.
e. Introduction of the Black Hole Strategy for scalable and representative sampling in graph-based materials learning.
e. Public dissemination of open data and software to ensure reproducibility and community adoption.
Summary
Overall, the FlexiMOFs-2 project delivered substantial technical and scientific outputs that extend well beyond its initial objectives. The integration of data science, AI, and experimental synthesis has produced both foundational tools and validated models for MOF discovery, which paves the way for predictive AI-guided materials design. The complementary studies on network-based learning further broaden the impact and positions the fellow at the forefront of data-driven materials innovation.
a. FAIR-MOFs Database: First FAIR, open-access dataset that maps the 3D structures of MOFs to their experimental synthesis conditions.
Mofstructure python Library: Automated MOF deconstruction and feature extraction module that integrates crystallographic and cheminformatics data.
b. Fairsynconditions: Graph neural network predicting synthesis reagents from structure; experimentally validated for new MOF synthesis.
c. Chemiteraction: An interactive web application for visualising reagent co-usage patterns and querying the FAIR-MOF database using any known keyword in the field, which helps researchers discover new links between materials structure and synthesis conditions.
Impact and Broader Relevance
The methodologies and infrastructure developed within FlexiMOFs-2 establish a new paradigm for data-centric discovery in reticular chemistry, which provides a direct way to accelerate laboratory synthesis, reduces trial-and-error synthesis, and promotes sustainable material discovery workflows. Beyond MOFs, the integrated text-mining, graph-learning, and structure-synthesis mapping workflows are inherently transferable to other material families such as covalent-organic frameworks (COFs), zeolites, perovskites, and battery electrode materials. Together with the open-source tools (mofstructure, fairsyncondition, cheminteraction, and the FAIR-MOFs database) constitutes a comprehensive digital ecosystem for transparent, reproducible, and data-driven design of porous materials. Furthermore, the integration of the FAIR principles aligns with the European Open Science Cloud (EOSC) and the EU’s commitment to open and responsible data stewardship. Overall, FlexiMOFs-2 advances the frontiers of digital chemistry, FAIR data, and sustainable computation, which directly supports the EU’s strategic priorities under the European Green Deal, Digital Europe, and Horizon Europe’s Open Science agenda. The project not only accelerates materials innovation through automation and artificial intelligence, but also establishes a foundation for interoperable cross-domain discovery that transforms how chemists, data scientists and AI engineers collaborate to design the next generation of functional materials.
Pathway to Uptake and Future Potential
To ensure effective uptake and lasting impact, the following strategic pathways are envisaged:
a. Implementation of an agentic knowledge graph to interlink properties, application and text-mined synthesis data to enable adaptive reasoning and self-improving predictions across material classes.
b. Integration of retrosynthetic workflows, connecting the FAIR-MOFs database with automated synthesis planners for forward and reverse route generation.
c. Demonstration and industrial collaboration, embedding the GNN-based synthesis recommender into robotic laboratory systems for real-time decision-making.
d. Sustainable infrastructure, supported through financial and institutional partnerships to scale cloud deployment and long-term hosting of the FAIR-MOFs ecosystem.
e. Technology transfer and IP strategy, promoting the commercial adaptation of predictive and deconstruction tools within digital chemistry and AI platforms.
f. Standardisation and interoperability, through open APIs and alignment with European FAIR data and Open Science standards.
g. Global engagement, expanding collaborations with European research infrastructures and international partners to foster cross-domain adoption.