Skip to main content

Integration and analysis of heterogeneous big data for precision medicine and suggested treatments for different types of patients

Periodic Reporting for period 2 - IASIS (Integration and analysis of heterogeneous big data for precision medicine and suggested treatments for different types of patients)

Reporting period: 2018-10-01 to 2020-06-30

The use of big data in healthcare is in its early days, and most of the potential for value creation remains unclaimed. The vision of IASIS is to turn the wave of big biomedical data heading our way into actionable knowledge for decision makers. iASiS develops a usable platform that collects, integrates, and analyses big data from disparate sources, in order to present useful patterns and results to authorities and health professionals, allowing for personalised diagnosis and treatment, and for the best policy decisions to be made. The objectives are:
a) The design of a unified conceptual schema and a Knowledge Graph that integrates diverse sources of data.
b) The implementation of a usable graphical interface (iASiS platform) that enables users utilising different analysis modules.
c) The provision of analysis modules and predictive models that extracts actionable knowledge for each disease.
d) The validation of the iASiS platform and modules by clinicians and policy makers from various institutions.
e) The definition of data management plans, as well as privacy- and trust-aware strategies.
A. Data Acquisition and Unified Representation
The project focused on the definition and implementation of a unified schema (Figure 1), dictating the representation of various data sources in the form of the iASiS Knowledge Graph (KG), allowing for dissimilar data integration and high-level analysis. The final KG version comprised 766,498,457 facts (RDF triples) and 222,946,082 nodes, including the following data for the two disease use cases (lung cancer and dementia):
-Approximately 4,000 patient Electronic Health Records (EHRs)
-More than 250,000 of PubMed publications, analysed with Natural Language Processing techniques
-Data from more than 10 genomic and phenotypical datasets
-Data from online databases (DrugBank, SIDER, KEGG) and ontologies (UMLS, OBO, HBO, ICD10, HUGO)

B. Development of the iASiS Platform Prototype
The final iASiS platform prototype provides a user-friendly environment that manages aggregated data and presents analysis’ results and statistics. Iit allows the user (clinician or health policy maker) to focus quickly on the individual case or a subset of patients that seem of particular interest, in order to support personalised medical decisions.

C. Analysis Modules
Through iASiS platform, end-users are provided with a set of analysis modules that can be used for decision making in personalised medicine. These modules provide descriptive (Figure 2) and predictive analytics (Figure 3), multi-variate analysis (Figure 4) of patients, 3D CT image investigations (Figure 5), a Question-Answering mechanism (Figure 6) allowing for free text questions to be answered from latest literature, as well as Literature Investigation (Figure 7).

D. User Evaluation
The usability of iASiS platform was guaranteed through the continuous involvement of expert users in all stages of the development process, providing useful feedback and suggestions. Also, an international Advisory Board of experts was formed, providing guidance and useful feedback to the project technical team.
Two rounds of pilot evaluation were performed for both use cases in different periods of the project, involving 96 clinicians and policy makers from various centres (Overall experience rating in Figures 8 and 9).

E. Data Privacy
To ensure the project follows the highest ethical standards and is in line with all EU and National legislation, an Ethics Committee was established, with representatives from all key partners. The committee monitored all data-related activities throughout the project, resulting in two iterative reports. Detailed data management plans were developed for both use cases and an internal audit for data assets and processing activities was performed by all partners to ensure compliance with the General Data Protection Regulation (GDPR).

F. Exploitation and Dissemination
An exploitation plan for all project assets has been submitted, including also competitor analysis and partners' individual exploitation plans. Two patent applications have been submitted based on iASiS results (Drug-drug interactions and Linguistic Risk Assessment) and a spin-off company has been created (LangAware).
Remarkable effort has been made to reach out to the communities of big data and personalised medicine, as well as the medical communities of the two diseases. The iASiS website has been the project’s main channel for sharing information, along with social media, eNewsletters and project brochures, distributed at several events. A professional showcase video was created and made available on the project website:
iASiS established an annual workshop (BDPM), through which synergies with all 6 same-call projects have been achieved, and has produced 58 scientific publications in scientific conferences and journals.
A. Progress beyond the state of the art
iASiS has progressed beyond the state of the art, by providing contributions in the following areas:

1. A unified schema for describing the main attributes and relations, in order to interconnect big data from disparate sources in the iASiS Knowledge Graph through a software pipeline.
2. A novel Natural Language Processing pipeline that extracts rich knowledge encoded in free text from Electronic Health Records, in order to reconstruct the medical history of each patient (Figure 10).
3. Scalable and efficient data mining tools that discover unknown patterns and associations from the integrated data of the Knowledge Graph (Figure 11).
4. An innovative module for extracting semantic (2D and 3D) and deep features from CT and MRI images to predict disease progression.
5. A thorough analysis of liquid and solid biopsy data and open genomic datasets that identifies sets of known or unknown variants related to our population and correlates those with patient survival (Figures 12 and 13).
6. Machine learning techniques that analyse the biomedical literature and structured databases to identify new potential associations.
7. A textual data analysis module (CogAware) that provides risk assessment regarding the Alzheimer’s disease.
8. Predictive models that employ machine learning, in order to assess patients' outcome and disease progression.

B. Project Results and Potential Impacts
iASiS aims at a significant impact on the EU healthcare system, ICT industry, and the wider society. The iASiS platform can provide an important tool supporting patients’ treatment, providing useful knowledge to medical professionals.
In the lung cancer use case, the analysis of trends for survival in a big population of patients, already produced results that were included in policy guidelines, as well as a book of protocols, supporting the health authorities and clinicians to improve patient management. The work on over-treatments and the correlation of drug-drug interactions with observed toxicities and survival rates (Figure 14), provided interesting conclusions to be published soon.
In the dementia use case, we focused on machine learning-based prediction models for patients' diagnostics and risk assessment, delivered in the form of simplified Decision Trees, to provide new insights on the diagnostic and prognostic value of various features (Figure 15). The analysis of these results by our experts led in a policy whitepaper to be published soon.