Skip to main content
Ir a la página de inicio de la Comisión Europea (se abrirá en una nueva ventana)
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS

Robust Explainable Controllable Standard for drug Screening

Periodic Reporting for period 1 - RECeSS (Robust Explainable Controllable Standard for drug Screening)

Período documentado: 2023-05-01 hasta 2025-04-30

Drug development pipelines follow a stringent, costly, and time-consuming process to ensure patients’ safety. In 2024, those pipelines lasted 10 to 15 years, and the cost of their different steps, from drug (re)discovery to clinical trials, amounted to 500 million to 2 billion euros. Despite the significant resources spent during development, almost 24% of late-stage drug candidates could not obtain marketing authorization from the European Medicine Agency between 2013 and 2021. Most of the reported reasons for failure of marketization are related to late detection of toxic side effects or low evidence for efficacy compared to the standard-of-care treatment. Compared to the increasing cost of pipelines, the high failure rate exerts considerable pressure upon patients, healthcare services, and pharma companies, especially those related to rare or neglected tropical diseases, for which funds or data might be insufficient. Drug repurposing, or repositioning, is a strategy to overcome some of the difficulties in in novo pipelines by focusing the drug discovery efforts on already available chemical compounds (for instance, therapeutic drugs or tool molecules). However, drug repurposing proves to be challenging in the face of a lack of information about drug-disease indications and noisy, high-dimensional, and missing biological data.

The RECeSS project is a transnational project between Germany and France, gathering machine learning and systems biology experts to implement drug repurposing for real-life resources. We aspire to provide an end-to-end approach to drug repurposing, from data collection and analysis across multiple diseases and candidate treatments to the risk-aware prediction of promising drug candidates.
In WP1, we published two open-source Python packages, stanscofi, and benchscofi, to enable easier access to drug-repurposing data sets and develop machine-learning approaches for drug repurposing. stanscofi automates data processing, visualization, training, and validation of methods, and it standardizes the implementation of drug repurposing algorithms (see Figure 1). benchscofi implements 21 drug repurposing algorithms from the state-of-the-art to enforce a quick and robust assessment of the performance of drug repurposing.

In WP2, we designed a drug repurposing approach called JELI (Joint Embedding-classifier Learning for improved Interpretability), which retrieves connections between diseases, drugs, and relevant biological features (e.g. gene expression) in a partially completed graph (e.g. with gene-to-gene interactions) to make recommendations on drug candidates with interpretable outputs. See Figure 2. We released the method in an open-source Python package. We demonstrated the prediction performance over several validation metrics and the potential explainability of JELI on synthetic and drug-repurposing data sets.

In WP3, we crafted an imputation method called F3I (Fast Iterative Improvement for Imputation) to deal with missing data in sparse drug repurposing data sets. This method combines some of the nearest data points to a sample with missing values via an automatically learned parameter that optimizes the preservation of the initial data distribution. This imputation method can be seamlessly chained with a drug repurposing algorithm to optimize for reasonable imputation and prediction. We have applied F3I to synthetic, drug repurposing, and well-known computer vision benchmarks and showed that F3I is robust to different mechanisms accounting for missing data and large sparsity (see Figure 3). We also proved theoretical guarantees on the quality of imputation by F3I.

In WP4, we applied JELI (from WP2) to predict new therapeutic candidates for melanoma and successfully retrieved a significantly perturbed biological pathway connected to melanin biosynthesis. This result allowed us to illustrate the explainability potential of JELI, which can be leveraged to prioritize and assess the relevance of predicted drug candidates.

In WP5, we considered designing a method to obtain theoretical guarantees on the rate of false positive candidates (predicted as promising therapies by a recommender system but unsuccessful in practice). So far, we have implemented a first approach based on bootstrapped Neyman-Pearson classification to extend the state-of-the-art to cases with unknown and adverse drug-disease annotations. This approach allows us to return a subset of candidates given a desired upper bound on the false positive rate.

In WP6, we considered the list of the Top 8 candidates predicted by JELI against melanoma and aimed to validate them independently by molecular protein docking. So far, we have identified protein targets of interest for melanoma by studying the pharmacophores (set of features describing the molecular interaction site) of known, successful drugs against melanoma. The objective is then to assess the affinity between those protein targets and the pharmacophores of the predicted candidates.
Results:
The RECeSS project looks at issues in drug development and drug repurposing through an altogether interdisciplinary prism: (semi-)supervised machine learning (WP2, WP3), adaptive testing (WP5), and systems biology (WP4, WP6) by proposing standard frameworks for the development of drug repurposing approaches (WP1). In particular:
- WP1 led to the creation of novel, larger, and richer public drug repurposing data sets (PREDICT and TRANSCRIPT), easily accessible by the two Python packages stanscofi and benchscofi. For the first time, these data sets contain negative examples, broad genomic information, and larger sets of drugs and diseases. The creation of the two packages allowed us to generate the first large-scale benchmark of the literature on drug repurposing with semi-supervised machine learning. These two packages ensure that no data leakage might interfere with the validation of methods through carefully implemented and method-agnostic functions.
- WP2, WP4, and WP6 are proof-of-concept of integrating our interpretable drug repurposing algorithm JELI into an accelerated drug development pipeline, from drug (re)discovery to assessment of the quality of a candidate. In particular, JELI (WP2) can accommodate all sources of graph-based prior knowledge and can flexibly interpret any feature information.
- WP3 and WP5 demonstrate that our machine learning methods are appropriate for biological data and the safety constraints of healthcare applications, as they allow us to overcome the lack of data for some drugs and diseases and to control the error rate in recommendation. F3I (WP3) can also be integrated easily into the drug development pipeline.

Potential impacts:
Our approach to drug repurposing can considerably decrease the drug identification phase from 5 years to a few hours (accounting for 26% of the $2 billion for a single pipeline). Moreover, drug repurposing allows for the restriction of the preclinical phase and Phase 1 to a minimum, which might help save up to 6.78% ($300k) of the total Phase 1 cost as predicted for central nervous system diseases. Then, medium-term outcomes of this project will significantly alleviate the economic burden of drug discovery pipelines and help find treatments more sustainably, especially for rare or tropical neglected diseases. This project also aligns with recent European health policies, in which drug repurposing has become a top priority in 2020.
Overview of the JELI algorithm developed in WP2.
Screenshot of the documentation page for the stanscofi and benchscofi packages.
Overview of the RECeSS project and its work packages.
Comparison of the imputation algorithm F3I to naive mean imputation.
Mi folleto 0 0