Skip to main content

SemAntically integrating Genomics with Electronic health records for Cancer CARE

Periodic Reporting for period 2 - SAGE-CARE (SemAntically integrating Genomics with Electronic health records for Cancer CARE)

Reporting period: 2016-12-01 to 2018-11-30

The Sage-Care project brings together subject matter experts to create a holistic informatics platform for rapidly integrating genomic sequences, electronic health records (EHR) and research repositories to enable personalised medicine strategies for malignant melanoma treatment in a clinical setting. The project addresses the disease melanoma, which is a malignant tumour of melanocytes with about 160,000 new cases diagnosed annually, with, high prevalence among Europeans. This is a serious health issue that affects EU citizens and the proposed project aims to impact the treatment of this disease by the integration of genomic markers, through secure semantic technology, with the myriad of information sources in electronic health records, genomic data, related initiatives and research publication repositories.
This provides a basis for personalised treatments by allowing health professional to make reasoned queries over holistic information sources. This aligns with the core of this proposal, which is driven by an actual clinical need to extract as much meaning as possible from biomedical data, by linking and analysing genomic, research and EHR data for cancer management.
The project has been broken down into the following core objectives:
• Elucidation of software specifications and clinical functional requirements with end users so as to maximise impact on the health care of EU citizens.
• Development of high performance computing algorithms to rapidly annotate genomic sequences in order to link gene ontologies to electronic health records.
• Development of ontologies and semantics search technology to allow clinicians to rapidly form a holistic view of clinical scenarios for patients.
• Protection of all EHRs and genomic sequences within a state of the art security framework.
• Integration of platform components
We have developed custom software that can query large genomic and phenotypic data sets that can automatically query associations. We also address the issue that current geno/phenotypic analysis doesn’t scale well on large data sets. Hence, there is a growing necessity for storing, accessing and processing this data in the new scalable models of service providers such as cloud computing. We have configured a custom built hardware platform, along with a unique and novel software architecture that can scale up to large data centre computing systems.
We have also taken steps to develop a scalable HPC framework that allows the semantic interlinking between spatially distributed electronic patients’ health records, associated genomic markers and published research, thereby allowing clinicians to make reasoned queries over vast knowledge bases for the diagnosis, treatment and management of malignant melanoma. We also recognise that while the potential of genomic data integrated with EHRs is enormous, the privacy and ethical issues are significant and any corrupted data, either through accidental or malicious modification, could be detrimental to patient well-being. To that end we have ensured compliance with European and national legislation with regard to the publication, access to and reuse of the patients’ personal and healthcare data, taking into consideration data anonymity, privacy, ownership, security and authorized access issues. We have also developed a custom security layer that ensures patient privacy.
So far the following deliverables have been achieved:

WP1: Clinical End User Requirements: Stakeholder register , Requirements Specification, System architecture, User acceptance test and traceability matrix, Ethical approval and audit trail.
WP2: High-performance computing: HPC Pipeline, Gene expression software.
WP3 Cloud Hardware Infrastructure and Platform Implementation: Semantic data model, Interactive user interfaces design,
WP4: IT Security: Context-aware policies access model
WP5: SAGE-CARE Platform Integration the following deliverables have been achieved: Data access layer
A project website is hosted on along with a social media campaign at @nsilico
Open days and careers nights hosted at the Discover Research Dublin along with the iWISH workshop to promote careers in science to girls.
An number of Workshops and conferences including CERC were held along with Researcher Training and Transfer of Knowledge Events
A number of open access peer reviewed Publications, Articles in Print and an eBooks have been published.
Our approach marries complementary innovation activities to address the limitations in the current state of the and thus underline the overall quality of the proposed research programme. In particular:
Clinical End User Requirements: An original aspect of this project is that the consortium is actively engaged with oncologists and cancer specialists who are driving the proposed project due to lack of suitable software tools for personalised cancer care. Consortium member NSilico is already providing melanoma EHR software and its clinical partners wish to integrate genomic sequencing in a meaningful way to impact treatment. Genetic tests such as BRAF are an important tool for melanoma diagnosis and for predicting patient outcome in response to targeted therapy and it is planned that integrated genomic analysis will provide a basis for the next generation of cancer care. SAGE-CARE will enable clinicians to rapidly exploit current and future knowledge in the fight against cancer.
High Performance Computing: Our focus goes beyond the current state of the art through the use of novel thread-safe extensible algorithms specialised for multi-core architecture. Our approach created technology to develop scalable software that allows the semantic integration of genomic big data with clinical data.
An innovative approach to semantic integration was also undertaken by developing a scalable technical infrastructure and platform for the efficient, homogenized access to, and the effective utilization of, the increasing wealth of medical information contained in the Electronic Health Records (EHRs). This approach will provide an integrated knowledge framework for better managing health problems through scalable algorithms to aid integrative analysis of complex information (through semantic interlinking), with the aim of clarifying and delivering clinically actionable information and supporting computational predictions to facilitate the prevention and treatment of diseases.
A characteristic of typical large scale data storage and management systems is that users lose control of their data, resulting in huge privacy issues. Protection against external and internal threats (staff with unnecessary privileges allowing them to read sensitive data) is required. To guarantee a high level of data protection, an innovative encryption scheme was adopted so as to encrypt directly the top secret and confidential data. A data cube approach to provide anonymised and de-identified data was also used.