Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

MachinE Learning Ledger Orchestration for Drug DiscoverY

Periodic Reporting for period 2 - MELLODDY (MachinE Learning Ledger Orchestration for Drug DiscoverY)

Reporting period: 2020-12-01 to 2022-05-31

Drug discovery (DD) and development is currently a time consuming and expensive process with many small molecules in development failing in the later stages due to lack of efficacy or to toxicity. More complex mechanisms of action and stricter regulation have resulted in a drastic drop in pharmaceutical R&D productivity over the past 60 years. Only 11% of drug candidates that enter clinical phases make it to approval. As a result, it takes on average €1.9 billion to bring one successful drug to the market. This trend is unsustainable and requires innovative solutions.

The pharmaceutical industry is embracing the virtualization of parts of DD processes through Machine Learning (ML) as a promising approach to improve time and cost efficiency. The expected efficiency gains in DD critically depend on the predictive performance and chemical applicability domain of models that predict the biological activities of small molecules.

MELLODDY demonstrates how the pharmaceutical industry can better leverage its data assets to virtualize the DD process with ML technologies in answer to the challenges and stricter regulatory requirements it is facing. The lack of a tested, secure and privacy-preserving platform for federated ML that enables pharmaceutical partners to extract DD-relevant information from all types of, not only their own but even each other’s competitive data, without mutual disclosure of the chemistry and biology each partner has worked on, has previously held back such demonstration, to the detriment of patients in the EU and beyond.

At the end of our 3 year collaboration, MELLODDY has reached its overall project objectives of
● building a flexible, scalable, and secure platform for FL
● performing audit, stress-test, and an evaluation of the platform
● demonstrating sufficient privacy preservation to allow platform usage with sensitive/ competitive data
● demonstrating predictive performance improvement of models trained with FL
● demonstrating chemical applicability domain expansion of models trained with FL

The first period of the project was devoted to reaching the intermediate project objectives namely building, testing and auditing the platform. The second part of the project focused on demonstrating superior performance. By the end of year 3, we have demonstrated federated model superiority, further enhancing predictive performance in the final run through revising data inclusion criteria, and through platform and hyperparameter optimization.

The successful demonstration of the predictive benefits from unlocking the joint data volume of 10 pharmaceutical partners, while strictly preserving the privacy of all underlying data and the resulting predictive models, will shape best practices and translate into substantial efficiency gains in the DD process, and in the future, drug development. Finally, MELLODDY will prepare and exploit a service-for-fee vehicle to ensure the MELLODDY technologies are available to the rest of the pharmaceutical sector.
As the single most important impact metric we list the involvement by all ten pharmaceutical companies of the better part of their accumulated discovery data warehouse in a fully operational cross-partner run. This feat required
● the formulation and implementation of robust procedures and software scripts to format the chemical and conventional bioactivity data from different sources and owners to the extent that they can be consumed seamlessly by the modelling software. The software scripts have since been open sourced as MELLODDY-TUNER with documentation and small public data files (derived from ChEMBL25) for unit testing.
● the early availability to the partners of SparseChem, the core multi-task machine learning approach (without privacy-preservation features), to enable them to explore functionality options in single party studies. In the federated context a SparseChem version plugs into the application layer to execute the core machine learning operations in a privacy-preserving setting. Standalone SparseChem has been open sourced.
● extensive theoretical and empirical testing of the robustness of the implemented privacy-preservation measures against various lines of attack.
● the audit by an independent party that the proposed solution elements met relevant industry standards for IT-technical and data security defined in joint user requirements.
● demonstration of FL platform at scale yields improvements across all pharmaceutical partners in the predictive performance of collaboratively trained models over single partner models

Noteworthy dissemination activities include:
● Open session at May 2022 General Assembly Meeting
● End of project press release announcing final results: https://www.melloddy.eu/y3announcement
● Overarching technical paper: https://arxiv.org/abs/2210.08871
● Overarching scientific paper: https://chemrxiv.org/engage/chemrxiv/article-details/6345c0f91f323d61d7567624

Exploitation discussions are underway to continue and expand upon the successes of the MELLODDY consortium. These activities are expected to launch in 2023.

Open source code bases are available on the MELLODDY website: https://www.melloddy.eu/open-source-code-bases
MELLODDY is positioned at the interface of two main long-term European dimensions of interest.

There is the long-term societal aim of continuous improvement of social wellbeing and health, particularly in the context of an increasing incidence and prevalence of chronic diseases related to unfavourable aging and suboptimal dietary and physical activity tendencies. This aim aligns well with the continuous ambition of the Pharmaceutical Sector to bring effective and safe medicines to patients and their unmet medical needs more efficiently in an effort to ensure the sector’s long-term economic health. With regards to this dimension, MELLODDY’s impact is broadly measured in terms of a better performance and an extended applicability of models that predict a broad spectrum of properties of relevance to the DD process. These measures are the most realistic short-term proxy for the extent to which DD processes can be in part virtualized and made more efficient. A confirmation of the working hypothesis that federated and privacy-preserving ML, implemented as multi-task ML across partners, can indeed boost the performance and applicability domain of ML models would no doubt inspire the use of MELLODDY and related technologies in pharmaceutical applications beyond discovery and other sectors.

There is the European ambition to demonstrate itself as a worldwide leader in developing the technological and economical infrastructure, know-how, value chains and growth potential that accompany the accelerated embedding of data science and artificial/augmented intelligence in economic activities, as reflected in phenomena like the Fourth Industrial Revolution. Here impact is measured in the mutual acceptance and use of guidelines and solution elements offered by MELLODDY partners, in the first instance within the consortium, but ultimately also beyond it (cfr. sustainability track). This process helps to shape the delineation of players and elements in future economic value chains. The demonstration of trust and operational feasibility of the first federated run help pave the way to virtualization.
Platform
Project