Skip to main content

Integration and analysis of heterogeneous big data for precision medicine and suggested treatments for different types of patients

Periodic Reporting for period 1 - IASIS (Integration and analysis of heterogeneous big data for precision medicine and suggested treatments for different types of patients)

Reporting period: 2017-04-01 to 2018-09-30

The use of big data in healthcare is in its early days, and most of the potential for value creation remains unclaimed. Towards this direction, iASiS aims to enable comprehensive access to data from disparate sources and results of analysis, in order to produce actionable knowledge for policy-making, within the domain of personalised medicine. The project is developing a system that collects, integrates, and analyses big data from disparate sources, providing useful insights and high-level analysis on an aggregated knowledge graph.
Given the above, the specific objectives of iASiS are:
a) to design a unified conceptual schema to represent all the diverse sources of available data,
b) to build an adaptive system able to manage data and content collected incrementally,
c) to provide actionable knowledge about disease diagnosis, prognosis, and treatment to decision makers,
d) to promote cooperation among clinicians and policy makers, and
e) to define privacy- and trust-aware strategies for the use of the data and the discovered knowledge.
A. Data Acquisition and Unified Representation
The first project months the available data sources were selected, defining pilot plans and collecting end-user requirements for the two project use cases (Dementia and Lung Cancer).
The data exploited in iASiS include:
• Electronic Health Records (EHRs) and medical images from patients.
• Genomic data. Acquired from various sources, such as the EGA, NCBI ClinVar and COSMIC.
• Open literature data from PubMed and open structured data from various databases and ontologies.
Specialized tools were developed for harvesting, semantic indexing and performing an initial analysis of all these datasets. The focus was on establishing interoperability across the different datasets.
Critical to the achievement of interoperability was the definition and implementation of a unified schema, in the form of the iASiS Knowledge Graph (KG) (figure 1). By bringing different sources of data together, the KG lends itself to further, high-level analysis. Such analysis was performed, aiming to discover latent causal relations between biomedical entities of different sources and develop powerful medical inference tools.

B. Development of the First Prototype
The first prototype of the iASiS platform has been designed and implemented, including an adaptive Graphical User Interface that can manage aggregated data and analysis results. Thus, it provides user-friendly access to collected, integrated and analysed data and additionally it provides links to outcomes and analysis results. Based on this information, users can efficiently and transparently make decisions that are tailored to individuals (figure 2).

C. Addressing Real User Needs
The usability and usefulness of the iASiS platform was guaranteed by the continuous involvement of users in all stages of the development process. This process started in the first months of the project, with the identification of user requirements. The process of deriving the user requirements was triggered through illustrative schematic scenarios that have been developed for both pilots, highlighting desirable features and interactions.

As an example, one of the scenarios developed for the lung cancer pilot includes the identification of patterns in the data of long-surviving lung cancer patients. Similarly, an example scenario for the dementia pilot, studies the relation of symptomatic treatments within a given class of drugs with different patient types, based on patient’s genetic (allelic) status.
D. Data Privacy
The data management plans for both use cases have been developed. These plans describe how data processing in iASiS will respect the policies associated with each data source, adhering also to the EU data regulations. This process is greatly assisted by the Ethics Committee of the project, which was established in the early stages. The Committee is led by an external expert and it reviews the pertinent procedures, permissions and documents together with the privacy and trust-aware strategies.
A. Progress beyond the state of the art
During this period, the integration of data from the various sources has been achieved. To this end, a unified schema has been defined and the KG has been implemented, which semantically connects all available knowledge. Innovative methods and technologies have been applied to different types of data.
In particular, novel NLP techniques have been applied to extract rich knowledge encoded in free text in EHRs, in order to integrate the results into the KG. Those techniques reconstruct the medical history of each patient, with the use of semantic annotators for entity recognition.
Moreover, an innovative module for extracting semantic (2D and 3D) and agnostic features (deep features) from CT images has been implemented and applied to an open access image database. The extracted features are used in a predictive modelling process, using Convolutional Neural Network models to search for patterns that support the discrimination of malignant and non-malignant nodules.
Concerning genomics, by combining large-scale data on genetic variants which affect the expression of distant genes (“trans- eQTLs”), with information on protein-RNA interactions and clinically relevant genomic variation, several candidate molecular interactions of interest have been identified that may have impact to diseases studied in the project.
Lastly, concerning open datasets, text mining and machine learning techniques have been adopted and extended, in order to analyse biomedical literature and combine it with knowledge from structured databases. A textual data analysis module has been developed to provide risk assessment regarding the Alzheimer’s disease for patients that have participated in a cognitive awareness task.
On top of the iASiS Knowledge Graph (KG), mining tools that extract knowledge and uncover unknown patterns from the combination of the aforementioned data have been developed. These mining techniques extend existing community detection approaches, to exploit semantics encoded in the KG, while scalability and efficiency are enforced.
B. Expected results until the end of the Project
By the end of the project, the consortium is planning to integrate knowledge from more datasets and ontologies. All individual modules will be extended. The platform second prototype, will incorporate changes, based on the user evaluation, providing more functionalities, which will hopefully lead to better insights for personalised diagnosis and treatment.
C. Potential Impacts
iASiS aims at a significant impact on the EU healthcare system, ICT industry, and generally the wider society. The iASiS platform can provide an important tool supporting patients’ treatment, providing useful knowledge to the medical professionals. Moreover, the project results, in the form of patterns and trend detection, can support authorities for better planning of public health activities and public health strategy.
Figure 1: Visualisation of the iASiS Unified Schema
Figure 2: Example of pattern extraction through Platform user interface