Skip to main content
Weiter zur Homepage der Europäischen Kommission (öffnet in neuem Fenster)
Deutsch Deutsch
CORDIS - Forschungsergebnisse der EU
CORDIS

Machine Learning for Tailoring Organic Semiconductors

Periodic Reporting for period 1 - MALTOSE (Machine Learning for Tailoring Organic Semiconductors)

Berichtszeitraum: 2021-03-01 bis 2023-02-28

MALTOSE, the acronym of the project stands for "MAchine Learning for Tailoring Organic SEmiconductors". Let's go through this from back to front.

Organic semiconductors are carbon-based materials, either molecules or polymers that have extended pi electron systems and have a certain electrical conductivity. They can absorb and emit light in the visible spectrum and can be used for organic light-emitting diodes, organic photovoltaic cells and even for organic circuits. There is much potential in organic semiconductors because they are potentially low cost, mechanically flexible, and the chemical compound space if conceivable molecules is huge.

In this project, we focus on organic semiconductors for photovoltaic cells, where the main figure of merit is the power-conversion efficiency. An organic photovoltaic consists among other details of a donor material and an acceptor material. For good power-conversion efficiency, these need to have the right optical and electronic properties, matching each other and the spectrum of sun light. Finding materials with just the right properties is what the term "tailoring" from the title refers to.

The challenge is, however, that the search space is vast because already the chemical compound space for one material is huge, but here two materials have to be found together. It is simply not feasible to determine the properties of all possible materials by full-fledged simulations (e.g. density-functional theory computations). Rather, a machine learning model shall be used to guide the search, such that only the promising candidate materials have to be investigated in detail.

The broader context is the research field of Materials Discovery, and the concrete application is the search for new materials for better organic semiconductors for applications such as organic electronics, organic light-emitting diodes (OLEDs), or organic photovoltaic cells. Thus, the main objective is to develop a machine-learning method for the search of materials with desired properties. This method has to be able to handle large organic compounds, but will also transfer to other kinds of materials.
The first half of the project was concerned with the development of a machine-learning model that is able to estimate relevant properties of a given molecular structure. A dataset with geometries and properties of molecules relevant for organic photovoltaics was designed. Further, the SchNet model of Schütt et al. [Schütt et al., JCP 2018, DOI:10.1063/1.5019779] was enhanced with a Set2Set readout unit [Vinyals et al., ICLR 2015, arxiv:1511.06391] and trained with the aforementioned dataset. This work has been presented at the Fall Meeting of the European Materials Research Society in September 2022, and a manuscript titled "Machine Learning for Orbital Energies of Organic Molecules Upwards of 100 Atoms" has been published in Physica Status Solidi B (DOI:10.1002/pssb.202200553).

Later, it was discovered that for practical application of the machine-learning model in a materials-search setup [Figure: "Big picture" of Machine Learning in Materials Discovery], the model should not depend on the exact molecular geometry. Rather, it should be able to handle molecular input data in the form of a molecular graph (i.e. specify just the atoms and the bonds between them, but not the exact bond lengths or angles). Such models have recently become available, and the "Uni-Mol+" model [Lu et al. Nature Comm. 2024, DOI:10.1038/s41467-024-51321-w] was extended for the needs of the MALTOSE project (multitask learning, extended training set, transfer-learning and fine-tuning approaches). Finally, we used the OPEP2 dataset [Greenstein & Hutchison, JPCC 2023, DOI:10.1021/acs.jpcc.3c00267] to design a two-stage machine-learning model that relates first the molecular graphs of donor and acceptor with molecular properties, and then the molecular properties with organic photovoltaic performance. This work has been presented at the Spring Meeting of the European Materials Research Society in May 2024 in Strasbourg in a contribution titled "Machine-Learning Driven Materials Search for Organic Photovoltaics".

The MALTOSE project has participated in the "Science is Wonderful!" competition [https://event.scienceiswonderful.eu] organized by the European Commission in 2022 and 2023. The proposal "Assembling molecules to make up our world" has been developed with Miguel Ángel Abril from CEIP "La Santa Cruz" in Caravaca de la Cruz, Spain, and is designed for pupils from 5th grade onwards (approx 10 years old). The didactic unit addresses some of the key concepts underlying the MALTOSE project, namely (a) the basic rules of chemistry, how to combine atoms to molecules with a ball-and-stick model, (b) to relate the molecular model to actual properties of the materials, and (c) to get an idea about the vast number of different compounds that can be built from just a handful of atoms. Even though the contribution was not selected for the Science Fair, we took the chance to deliver the class to the students of fifth grade of CEIP "La Santa Cruz" in Caravaca de la Cruz, Spain.
A machine-learning models have been developed that can accurately estimate the orbital energies (HOMO and LUMO) and other molecular properties, even for large organic molecules. The orbital energies are useful for estimating the suitability of a given material for photovoltaics applications because HOMO and LUMO are proxies for the electron affinity and ionization energy, respectively.
Here, the emphasis is on "large molecules". Previous studies state-of-the-art models are mostly trained and evaluated on the "QM9" dataset, which contains only molecules of up to 9 non-hydrogen atoms [Gilmer et al. "Message Passing Neural Networks" in "Machine Learning Meets Quantum Physics", Springer (2020)]. By (a) extending the data corpus and (b) by enhancing the architecture of the model, we have achieve good accuracy with molecules of sizes beyond a hundred atoms, as explained in detail in the article "Machine Learning for Orbital Energies of Organic Molecules Upwards of 100 Atoms" published in Physica Status Solidi (B), DOI:10.1002/pssb.202200553. The code, data, and trained models are published at DOI:10.5281/zenodo.7328587.

Further progress beyond the state of the art has been achieved by the "Multitask Uni-Mol+" model, an estimator for an extended set of molecular properties that does not require the input coordinates, but only the molecular graph as one would draw it with a pencil on paper. In summary, the contributions of this project are advancing the state of the art towards a data-driven materials-search scheme for organic photovoltaics. The methods, however, not bound to this particular application and are hoped to be helpful for any materials-search problem.
Sketch of the materials search space for organic photovoltaivcs
"Big picture" of Machine Learning in Materials Discovery
Mein Booklet 0 0