Skip to main content
European Commission logo print header

Machine learning prediction for breast cancer therapy

Article Category

Article available in the following languages:

Machine learning algorithms match tumours with treatments

Combining data from the European Union and United States, the PredAlgoBC project has identified new biomarkers for breast cancer tumours. These could eventually be used to identify new, personalised treatment options.

Digital Economy icon Digital Economy
Health icon Health

Cancer researchers and oncologists are increasingly supportive of replacing standard treatments with patient-specific ones that account for disease heterogeneity. In the case of breast cancer, the fact that clinicians don’t have enough information on patient-specific tumour characteristics sometimes leads to relapses in the form of metastatic cancer. Whilst first-line treatments for breast cancer treat around 90 % of patients successfully, this survival rate goes down to 27 % for metastatic cancer. To solve this problem, experts have been building huge databases matching specific tumour characteristics (potential biomarkers) with specific treatment responses in patients. But they’re just getting started, and only a few biomarker signatures have reached the clinic so far. “This is what we call the ‘curse of dimensionality’,” says Agnes Basseville, researcher at the Institut de Cancérologie de l’Ouest (ICO) in France and coordinator of the PredAlgoBC (Machine learning prediction for breast cancer therapy) project. This research was undertaken with the support of the Marie Skłodowska-Curie Actions programme. “We currently have too many measured characteristics for not enough patients, and the machine learning (ML) algorithms we use to analyse biomarker data do not perform well in such a setting.” The PredAlgoBC project aims to fill this gap by combining various mathematical approaches with thorough biological analysis. With this work, Basseville hopes to ensure that the information given by the algorithm will be usable in the clinic. “We built the project mainly around two public databases: GEO (American) and ENA (European). We were able to collect data from over 4 000 patients with breast cancer along with related follow-up information. By combining data sets, we obtain sufficient statistical power to provide a comprehensive overview of tumour complexity, although some of the data we wanted to harvest, namely RNA-Seq, is available only upon request and after a 6-month evaluation of this request. Due to time constraints, we decided to not use it.”

Hormonotherapy breakthrough

The data set compiled was split into two parts. The first was used to teach the algorithm how to better predict treatment outcome, after which the second was used to test the prediction performance of the project’s model. “In that way, we can compare model predictions with the known response and determine whether our models are good performers or not,” Basseville explains. For each model, the variables were ranked based on their importance in the overall prediction. The best-ranked variables are the ones that can be tested as potential biomarkers. While the predictions are not yet good enough to be used in the clinic, variable ranking has allowed the team to identify neural development actors as key tumour components linked to poor responses to hormonotherapy. This is a serious breakthrough as such a link had never been formally identified before. Another project outcome is the implementation of a deep learning algorithm to create virtual patient cohorts. These are particularly handy, as they enable the sharing of patient-level data without disclosing any information on real-world subjects. Finally, the project’s newly found biomarkers will soon be presented in a peer-reviewed article. Basseville and her team will follow it up with the new data sets being compiled that will help them further validate these biomarkers. “The next step will consist in defining the best way to evaluate these components in the clinic, using ICO tools to operate the test routinely with assays like PCR or immunohistochemistry. Once we have chosen the best clinical assay, we’ll need to perform a retrospective analysis on patients at ICO to validate our new markers and confirm how helpful they are when deciding who should receive hormonotherapy,” Basseville notes. This process, which will be accompanied by research on how to best exploit the new biomarkers as a new target for treatment, is expected to take several years.


PredAlgoBC, breast cancer, algorithm, biomarkers, relapse, predictions, treatment

Discover other articles in the same domain of application