Skip to main content
European Commission logo print header

Cancer Long Survivors Artificial Intelligence Follow Up

Periodic Reporting for period 2 - CLARIFY (Cancer Long Survivors Artificial Intelligence Follow Up)

Reporting period: 2021-07-01 to 2022-06-30

Survival rates of cancer patients were rather poor until recent decades, when diagnostic techniques have been improved and novel therapeutic options have been developed. It is estimated that more than 50% of adult patients diagnosed with cancer live at least 5 years in the US and Europe. This situation leads to a new challenge: to increase the cancer patients’ post-treatment quality of life and well-being. This project aims at identifying cancer survivors from three prevalent types of cancer, including breast, lung and lymphomas. The patient data will be collected from different Spanish hospitals and the selection will be based on ongoing health and supportive care needs of the particular patient types. We will determine the personalised factors that predict poor health status after specific oncological treatments. For this aim, Big Data and AI techniques will be used to integrate all available patient´s information with publicly available relevant biomedical databases as well as information from wearable devices used after the treatment. To predict patient-specific risk of developing secondary effects and toxicities of their cancer treatments, we will build novel models based on statistical relational learning and explainable AI techniques on top of the integrated knowledge graphs. The models will utilise background knowledge of the associated cancer biology and thus will help clinicians to make evidence based post-treatment decisions in a way that is not possible at all with any existing approach.
-EHR, information of the circadian rhythms, bibliography data and molecular information has been processed and analyzed.
-Development of a new dataset for training and evaluating Biomedical Relation Extraction (Bio-RE) models, and the training and selection of such models to enrich the CLARIFY KG and support the decision-making process of the predictive models. Publicly available data from TCGA for lung and breast cancer has been analysed and integrated with pathway analysis. The main results identified pathways that were associated with patient survival and mutations that correlated with the activity of these pathways.
-All the tools required to execute the layers of Semantic Data Integration and Knowledge efficiently have been developed. The unified schema provides a common understanding of the main characteristics of the lung and breast cancer data integrated into the KG. Rest APIs services explore the KG via the Platform.
-Consolidation of the results in developing explainable predictive models into one coherent framework for supporting clinical decision-making in relation to post-treatment stratification of lung cancer patients based on their relapse risk. To facilitate personalised predictions that utilize the genomic profiles of the patients, preliminary results on the imputation of genomic information (aneuploidy and pathway scores) into the patient data from related publicly available resources have been delivered.
-An overall architecture of the Platform and a central repository for secured anonymised data have been developed. A first prototype of the Platform has been implemented and the integration of components is on going.
-The project has been analysed and monitored from an ethical and legal perspective contributing to the selection and design of the system components and awareness raising. The ethics of the AI component are aligned with the the EU policy initiatives. EU and international standards on health informatics and privacy were monitored, in addition to legislation and regulation.
-Intermediate results have been disseminated via scientific journals and conferences and communicated through the website, social media and press releases. Meetings with healthcare authorities have taken place and collaborations with other stakeholders (patients associations and other EU-funded projects on the field of cancer and AI) are on-going.
-Structured and non structured information can be integrated to extract patterns from cancer data: Generation of a Spanish annotated cancer corpus that can be used to train NLP models to extract concepts from text; Generation of deep learning models for Name Entity recognition and negation and uncertainty detection in Spanish for cancer concepts; Analysis of circadian rhythms of cancer patients and comparison with healthy population; Generation of models to understand the behavior of the disease.
-A significant overlap between training and test relationships ranging from 26% to 86% was found during the development of MedDistant, the Bio-RE benchmark dataset. Furthermore, we noticed several inconsistencies in the data construction process of these benchmarks, and where there is no train-test leakage, the focus is on interactions between narrower entity types. Several existing Bio-RE datasets, together with their characteristics in terms of train-test overlap and coverage have been identified.
-Integrating genomic data from TCGA into a mechanistic model of the p53 pathway enabled us to run patient-specific simulations of the p53 DNA damage response that were associated with patient survival in lung cancer. Results indicate that the simulated responses may be used as a model-based biomarker.
-The tools to execute Semantic Data Integration and the KG creation are built on state-of-the-art methods published in top-ranked journals and conferences. Techniques to perform data quality assessment efficiently have been developed, as well as query processing. The analytical methods developed on top of the CLARIFY KG allow the discovery of drug-drug interactions in multidrug treatments and patterns between familial cancer antecedents.
-No other similar method has been proposed in literature yet, making these results the “first of its kind” solution. We outperform conventional methods such as TNM-based heuristics. Results show accuracy between the 70%-76% range, over 10% above the baseline. The main factor contributing to such performance boost is that our consolidated predictive model is aimed at patient stratification based on their risk of relapse. This differs from the state of the art (predominantly statistical) models by putting emphasis on the personalised approach instead of drawing generic inferences from large patient samples. We also deliver a comprehensive, easily-deployable suite of both traditional and cutting-edge relational / representation learning models that are augmented by explanation capabilities, which has not been covered by related approaches to date.
-The CLARIFY Platform is a benchmark in current medical practice. The Platform allows clinicians to analyze selected cohorts of patients, obtain trends, detect high-risk patients and analyze post-treatment prognosis and survival.
-The data analysis facilitated by the Platform allows clinicians to identify cancer patients´ unmet needs for the first time from a clinical setting based on previous scientific evidence and RWD thus improving their care and quality of life. We are already changing standard care, and hopefully by the end of the project will be able to define new healthcare policies for cancer survivors.
-Legal research has been carried out to arrive at technical and organisational specifications for the design and implementation of patient data collection, analysis and sharing for the Platform. The refinement of legal requirements will be used in policy-making and technical standardisation.