European Commission logo
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS

Machine Learning-aided Multiscale Modelling Framework for Polymer Membranes

Periodic Reporting for period 1 - ML-MULTIMEM (Machine Learning-aided Multiscale Modelling Framework for Polymer Membranes)

Período documentado: 2021-11-15 hasta 2023-11-14

Materials science explores the properties of materials in a variety of ways: from theoretical models, to computational models and real-world experiments. These analyses allow the development of new materials that suit specific needs. The ML-MULTIMEM project focuses on empowering the molecular simulation of polymers - a ubiquitous family of materials in manufacturing, healthcare, energy, and environmental technologies - with artificial intelligence and especially machine learning methods. This synergy allows modelling complex materials at different scales - what is termed "multi-scale modeling" - with an innovative approach. This offers the opportunity to address a critical challenge in materials science: the bottom-up rational design of complex polymers for diverse applications, using molecular simulation methods. Polymers materials require multiscale strategies to be studied, typically including coarse grained representations. To overcome limitations associated with traditional coarse graining strategies, this project integrated Machine Learning (ML) into molecular simulation methods, utilizing Graph Convolutional Neural Networks to obtain coarse grained force fields for molecular simulations. The project achieved three significant goals:
1. We developed a ML-based multiscale simulation strategy, bridging atomic and coarse-grained scales, for the study of macromolecular and organic systems at bulk conditions.
2. We incorporated the developed ML method into open-source packages and widely used simulation tools.
3. We utilized the developed strategy to simulate organic liquids and polymers of industrial interest (polyethylene, PIM-1), to showcase its application to real-world test cases.
This work holds societal significance due to the pervasive use of polymers in manufacturing, healthcare, energy, and environmental technologies. Improving our capacity to design polymers efficiently has the potential to catalyse breakthroughs in diverse sectors. The proposed ML-based approach offers a pathway to potentially increase efficiency and versatility of molecular modelling more generally, with broad implications for advancements in a multitude of industries and technologies.
The focus of the ML-MULTIMEM project has been advancing the application of machine learning (ML) methods in coarse grained (CG) molecular simulations. The project progressively addressed systems of increasing chemical complexity, and investigated the emerging challenges in this nascent field. We conducted an extensive investigation of the effect of model hyperparameters and loss function definition for the development of ML CG force fields for 5 different systems: liquid benzene mapped with 1 CG bead per molecule, liquid benzene mapped with 3 CG beads per molecule, polyethylene mapped with 1 CG bead per monomer, polyethylene mapped with 2 CG beads per monomer, PIM-1 mapped with 3 CG beads per molecule. Despite significant challenges associated with hyperparameter optimization, a combination of settings that yielded satisfactory results was identified for each system. For acceptable models, temperature and size transferability tests were also performed, and consistent behaviour was observed.
It was found that multiple criteria need to be evaluated in order to identify a suitable model, and these criteria are not only related to training metrics, but also to simulations performed with the trained models. Particularly important is the definition of the components in the loss function and their relative weight during the training. In this regard, a dedicated study of self-adaptive methods for the determination of loss function coefficients was conducted, and a statistically-grounded scoring procedure was proposed to evaluate different methods and identify the best one.
Open-source codes (SchNetPack) were extended in order to enable the study of macromolecular systems by incorporating connectivity, particle typing, and molecule membership information, allowing to discriminate inter- and intramolecular particle neighbours. Moreover, the developed models were interfaced with popular open-source molecular dynamics codes (LAMMPS). These contributions to open-source projects allow a broad diffusion and exploitation of the project results.
The project results were widely disseminated to the research community through the publication of 2 conference papers and the participation to diverse scientific congresses, with focus on artificial intelligence, materials science, and engineering, promoting interdisciplinary knowledge transfer. The work was also presented in several seminars and invited talks for national and international audiences. Moreover, the project team has collaborated to the organization of the scientific workshop “AI in Natural Sciences and Technology (AINST)”. Outreach actions allowed interaction with the general public, especially pupils and high-school students.
The ML-MULTIMEM project has done significant progress beyond the state of the art in the development of Machine Learned force fields for Coarse Grained molecular simulations. Increased insight into the hyperparameters optimization challenges in this setting has been gained and documented, putting forth, when possible, guidelines and recommendations. The project contributed to the growing awareness within the research community regarding the need for a more comprehensive evaluation of ML CG force fields, by considering multiple metrics extracted not only from the training process, but also from subsequent simulations.
The project has tackled the study of organic and macromolecular systems at bulk conditions, whereas previous literature reports mostly focused on isolated macromolecules for the development of ML CG force fields. This required the extension of the methods to consider also connectivity, particle typing, and molecule membership information. The codes developed during the project, implementing the aforementioned extensions are shared as open source projects, to maximize impact and exploitation of the results.
The problem of learning using a multicomponent loss function, which constituted a clear need in the ML-MULTIMEM setting, was address by investigating self-adaptive methods for loss function coefficients determination. Moreover, a ranking procedure for comparative analyses of different methods was developed. Even though it was tested here in the context of a natural science application, the problem of multicomponent loss learning is a general one, and therefore the proposed method can be applied in a broad variety of settings.
Visual summary of the project
Project logo