Skip to main content
Weiter zur Homepage der Europäischen Kommission (öffnet in neuem Fenster)
Deutsch Deutsch
CORDIS - Forschungsergebnisse der EU
CORDIS

Identification of risk factors for the development of colorectal cancer and / or premalignant lesions using Big Data tools and Artifical Intelligence for early cancer detection

Periodic Reporting for period 1 - AMX DATA (Identification of risk factors for the development of colorectal cancer and / or premalignant lesions using Big Data tools and Artifical Intelligence for early cancer detection)

Berichtszeitraum: 2020-10-01 bis 2021-12-31

With 1.93 M new cases diagnosed in 2020 and 48% mortality rate (Globocan, 2020) colorectal cancer (CRC) is the second leading cause of cancer-related deaths worldwide. Most of these cancers and deaths could have been prevented by increasing the use of recommended screening tests, which can detect premalignant lesions, called advanced adenomas (AA) in the colon and rectum. Considering most AA will become cancerous, removing them on time through a routine colonoscopy can effectively prevent cancer from occurring. Furthermore, regular screening increases the likelihood that CRC that does develop will be detected at an early stage, when they are more likely to be cured, treatment is less expensive and recovery is faster. CRC has a 5-year survival rate of 92% if diagnosed in Stage I and only 11% if diagnosed in Stage IV. However, compliance is a huge problem – only ~62% of eligible population is screened in the US and ~10% in EU. CRC screening programs are recommended to average risk healthy individuals from 50 to 75/85 years old (age will vary according to specific regional guidelines) but it has been recently shown that the incidence of colorectal cancer in people under 50 is increasing in recent , so healthy population of other age ranges (generally 40-49 years old) who currently do not have access to screening programs, but as observed in recent studies, do have a growing risk of developing cancer (in some cases similar to screened population). There is therefore a significant clinical unmet need nowadays for blood-based CRC screening that detects cancer - in its different stages - early, accurately and easily. ColoFast – an innovative early CRC cancer detection blood-based test developed by AMADIX- has such potential and has shown superior accuracy through a clinical validation study over 965 patient samples from 13 different hospitals, but the company continues to go further, not only working at a molecular level but also exploring the infinite possibilities that clinical records and patient lifestyle data offer through the latest Artificial Intelligence (AI) and Advanced Data Analytical Tools, in order to identify new risk factors, to develop predictive models which will help us anticipate hidden underlying cancer risk factors and help improve the quality of life and survival of cancer patients.

The principal objective of AMX Data is identify the relationship of a number of indicators present in the medical history of patients with high risk of CRC and/or premalignant lesions development.
As a secondary objective, AMX Data project will serve to:
Establish a predictive model that improves an individual's risk identification for CRC.
Establish a predictive model that improves the identification of an individual's risk of premalignant (and specifically advanced adenomas) lesions.

AMADIX to anticipate the disease and treat intime that group of healthy individuals who currently do not have access to screening programs and have higher potential to develop CRC to not develop the disease.
The knowledge that the AI has contributed to the company both in the statistical analysis team and in the Big Data team have reinforced these two areas, allowing to expand knowledge of the analysis technologies used, as well as speed up the studies.

The main results achieved until now are mostly related to:

- Data structuration and loading: The data already held by AMADIX was obtained from different sources and documents with different structures, so the work of structuring the data requires special attention and is crucial for an accurate analysis. The data has been cleaned and structured in a number of steps. For example, data consolidation, deletion of duplicate data sets, handling of missing data, creation of a primary key for patient identification, and database design and creation were all performed by the IA.

- New CRC risk factors were identified. The IA was involved in testing various machine learning algorithms, creating new features by feature engineering and the analyzing molecular features and their combination by feature engineering, then testing them with machine learning algorithms. In case of the EHRs the creation of new features, decompose the data into training and testing datasets and test machine learning procedures using clinical features present in EHRs were the main tasks.


- Data analysis to stablish correlation between the different risk factors and the appearance of the illness, including statistical verification of the sample data (CRC, AA, control and random), correction of possible sample bias, principal component analysis (PCA) to evaluate the different variables, graph analysis, itinerary analysis, statistical analysis of the variables. The risk factors were examined in several univariate and multivariate ways. Correlation was looked at between the features as well as the variation inflation factor to examine multicollinearity between the features. Healthy and patients affected by CRC and AA were compared along the features using hypothesis tests. Exploratory data analysis was used in order to look for patterns and summarizing the dataset's main characteristics beyond what they learn from modeling and hypothesis testing.

- Clinical analysis of the variables. The validity of all results has been tested with comparative literature using recent research on the subject.
AMX Data is highly novel because it is based on a new risk CRC factors to create prediction models whith will help AMADIX anticipate cancer and imporve the quality of life and suvirval of patients. An effort to identify risk factors that may be related to colorectal cancer has been made in order to prevent the disease. Many of these risk factors may be indicated in patients' medical records. Identifying the risk factors for CRC or malignant lesions in the patients' medical records could allow identifying patients with a higher risk of suffering from the disease, and therefore, susceptible to undergo an early detection (screening) test.

These risk factors identified need to be validated in an independent cohort, in an algorithm.

Moreover, AMX Data will have an impact in decreasing CRC mortality as a consequence of identifying and treating patients at earlier stages of the disease.
amx-data-project-website-amadix.png
amx-data-project-logo.png
Mein Booklet 0 0