Skip to main content

MachinE Learning Ledger Orchestration for Drug DiscoverY

Periodic Reporting for period 1 - MELLODDY (MachinE Learning Ledger Orchestration for Drug DiscoverY)

Reporting period: 2019-06-01 to 2020-11-30

Drug discovery (DD) and development is currently a time consuming and expensive process with many small molecules in development failing in the later stages due to a lack of efficacy or due to toxicity. More complex mechanisms of action and stricter regulation have resulted in a drastic drop in pharmaceutical R&D productivity over the past 60 years where there has been roughly a doubling of investment for a halving of output (approval of drugs). Only 11% of the drug candidates that enter clinical phases make it to approval. As a result, it takes on average €1.9 billion to bring one successful drug to the market. This trend is clearly unsustainable and needs to be addressed by the development of innovative solutions.

The pharmaceutical industry is embracing the virtualization of parts of DD processes through Machine Learning (ML) as a promising approach to improve time and cost efficiency. The expected ML-driven efficiency gains in DD critically depend on the predictive performance and chemical applicability domain of models that predict the biological activities of small molecules. These model characteristics are in turn determined by the available volume of direct and indirect training data and the ability of advanced ML approaches to extract DD-relevant information. Consequently, the industry is making the first steps towards ML approaches that leverage more data than that generated within a given portfolio project’s scope. However, due to concerns of privacy and IP, the scope of these efforts has been strictly confined within the boundaries of individual companies; it is also often leaves unconventional but information rich data untapped.

MELLODDY is on its way to demonstrate how the pharmaceutical industry can better leverage its data assets to virtualize the DD process with world-leading ML technologies in answer to the ever-increasing challenges and stricter regulatory requirements it is facing. The lack of a tested, secure and privacy-preserving platform for federated ML that enables pharmaceutical partners to extract DD-relevant information from all types of, not only their own but even each other’s competitive data, without mutual disclosure of the chemistry and biology each partner has worked on, has previously held back such demonstration, to the detriment of patients in the EU and beyond.

MELLODDY has now reached its intermediate project objectives of
● building a flexible, scalable, and secure platform for FL
● performing audit, stress-test, and an evaluation of the platform
● demonstrating sufficient privacy preservation to allow platform usage with sensitive/ competitive data

The first period of the project has been devoted to reaching the intermediate project objectives namely building, testing and auditing the platform. We have now been able to successfully train a multi-partner multi-task model in a distributed and privacy-preserving way, on data at scale with MELLODDY’s 10 pharmaceutical partners. This demonstration already involved an unprecedented volume of highly private and competitive DD-relevant data points. The second part of the project will primarily focus on optimizing performance improvement with the aim of achieving MELLODDY’s ultimate project objectives:
● demonstrate predictive performance improvement of models trained with FL
● demonstrate chemical applicability domain expansion of models trained with FL

The successful demonstration of the predictive benefits from unlocking the joint data volume of 10 pharmaceutical partners, while strictly preserving the privacy of all underlying data and the resulting predictive models, will shape best practices and translate into substantial efficiency gains in the DD process, and in the future, drug development. Finally, MELLODDY will prepare and exploit a service-for-fee vehicle to ensure the MELLODDY technologies are available to the rest of the pharmaceutical sector.
As the single most important impact metric we list the involvement by all ten pharmaceutical companies of the better part of their accumulated discovery data warehouse in a fully operational cross-partner run. This feat required
● the formulation and implementation of robust procedures and software scripts to format the chemical and conventional bioactivity data from different sources and owners to the extent that they can be consumed seamlessly by the modelling software. The software scripts have since been open sourced as MELLODDY-TUNER with documentation and small public data files (derived from ChEMBL25) for unit testing.
● the early availability to the partners of SparseChem, the core multi-task machine learning approach (without privacy-preservation features), to enable them to explore functionality options in single party studies. In the federated context a SparseChem version plugs into the application layer to execute the core machine learning operations in a privacy-preserving setting. Standalone SparseChem has been open sourced.
● extensive theoretical and empirical testing of the robustness of the implemented privacy-preservation measures against various lines of attack.
● the audit by an independent party that the proposed solution elements met relevant industry standards for IT-technical and data security defined in joint user requirements.
MELLODDY is positioned at the interface of two main long-term European dimensions of interest.

On the one hand, there is the long-term societal aim of continuous improvement of social wellbeing and health, particularly in the context of an increasing incidence and prevalence of chronic diseases related to unfavourable aging and suboptimal dietary and physical activity tendencies. This aim aligns well with the continuous ambition of the Pharmaceutical Sector to bring effective and safe medicines to patients and their unmet medical needs more efficiently in an effort to ensure the sector’s long-term economic health. With regards to this dimension, MELLODDY’s impact is broadly measured in terms of a better performance and an extended applicability of models that predict a broad spectrum of properties of relevance to the DD process. These measures are the most realistic short-term proxy for the extent to which DD processes can be in part virtualized and made more efficient. A confirmation of the MELLODDY working hypothesis that federated and privacy-preserving ML, implemented as multi-task ML across partners, can indeed boost the performance and applicability domain of ML models would no doubt inspire the use of MELLODDY and related technologies in pharmaceutical applications beyond discovery and other sectors.

On the other hand, there is the European ambition to demonstrate itself as a worldwide leader in developing the technological and economical infrastructure, know-how, value chains and growth potential that accompany the accelerated embedding of data science and artificial/augmented intelligence in economic activities, as reflected in phenomena like the Fourth Industrial Revolution. Here impact is measured in the mutual acceptance and use of guidelines and solution elements offered by MELLODDY partners, in the first instance within the consortium, but ultimately also beyond it (cfr. sustainability track). This process helps to shape the delineation of players and elements in future economic value chains. The demonstration of trust and operational feasibility of the first federated run help pave the way to virtualization.