Skip to main content

Khresmoi Multilingual Medical Text Analysis, Search and Machine Translation Connected in a Thriving Data-Value Chain

Periodic Reporting for period 2 - KConnect (Khresmoi Multilingual Medical Text Analysis, Search and Machine Translation Connected in a Thriving Data-Value Chain)

Reporting period: 2016-02-01 to 2017-07-31

The healthcare sector has many stakeholders, including the pharmaceutical and medical products industries, healthcare providers, health insurers, clinicians and patients. Each stakeholder generates pools of textual data, which have typically remained disconnected. The amount of information to analyse in the health sector is growing rapidly. The two types of textual information in the medical domain that are of particular interest in KConnect are published scientific papers in the medical domain, and Electronic Health Records (EHR). According to Medline Trend, 1.120.070 papers were published in Medline in 2013, almost double the number of papers in 2003 (591.637). Making sense of the knowledge contained in this amount of complex unstructured text can only be done rapidly enough through the use of (semi-)automated text analysis techniques. A hospital with 250.000 active patients generates one Terabyte of text data per year. It is essential to process this data for Comparative Effectiveness Research to predict which treatments work best for which patients; for Predictive Modeling to flag patients with potential negative developments (e.g. potentially suicidal psychiatric patients); as well as for Quality Control of the healthcare system. As increasing numbers of medical establishments are realising the potential of EHR analysis, and also the cost of not doing this analysis in terms of inefficiency and unnecessary loss of life, the demand for such solutions will increase significantly in the next years.

The overall objective of the KConnect project was to create a medical text Data-Value Chain with a critical mass of participating companies using cutting-edge commercial cloud-based services for multilingual Semantic Annotation, Semantic Search and Machine Translation of Electronic Health Records and medical publications.

To achieve this overall objective, the KConnect project achieved six sub-objectives:
1. Facilitate straightforward end-user adaptation of KConnect’s multilingual medicine-specific Semantic Annotation, Semantic Search and Machine Translation technologies to new languages, by making available language adaptation toolkits.
2. Productise multilingual medicine-specific Semantic Annotation, Semantic Search and Machine Translation services through a cloud-based market and as installable packages on private clouds.
3. Facilitate integration of multilingual medicine-specific Semantic Annotation, Semantic Search and Machine Translation technologies into online health portals and vertical search solutions through two routes: the cloud-based market and locally installed as part of a private cloud solution.
4. Expand the multilingual medicine-specific Semantic Annotation, Semantic Search and Machine Translation technologies to the analysis of patient records, to allow straightforward implementation of innovative solutions within hospitals.
5. Develop pricing models and business models to exploit both the cloud-based market and customised vertical search solution approaches.
6. Ensure impact and take-up through the effective dissemination and communication of project results, in particular through the creation of a KConnect Professional Services Community.
The annotation pipeline adaptation toolkit was created and evaluated, with Swedish and Hungarian components integrated. A Mimir template is provided for configuring the Semantic Search indexes of annotations created by the annotation applications. The new version of the Machine Translation adaptation toolkit, Eman Lite Web, wraps the training pipeline into an easy-to-use webservice. The tools for classification and log analysis have also been developed.

The KConnect Cloud is deployed to Amazon Web Services and is publicly accessible ( It includes services for semantic annotation, semantic search, the medical knowledge base, and machine translation. The main new features are the support of payments and providing training videos to make the use of the services more straightforward. The KConnect Cloud Market also provides a capability for packaging the supported services as Docker images, to be used for use cases requiring local installation and usage.

On the TRIP search engine production system: the machine translation of queries and results is called on the KConnect Cloud, and query suggestion based on search log analysis is installed locally. A prototype for the analysis of publications describing Randomised Controlled Trials is available. On the new Health on the Net KConnect search system: The Knowledge Base on the KConnect Cloud is called for the query suggestion for the semantic search interface, and the trustability and readability classification APIs are called. These services are also used in a Chrome plug-in available for download on the Chrome Web Store. Precognox has used the Hungarian semantic annotation services in the following applications: KConnect semantic annotation is in the production system of NOTA, the search engine of Akadémiai Kiadó (Akadémiai Publisher), and KConnect semantic annotation is in the Biomedical Sales Lead Generator system, used by two client companies of PREC.

Semantic annotation pipelines and semantic indices for patient records were developed and are available for both English and Swedish. In the Region Jönköping in Sweden, a connector for the KConnect Swedish medical record search to the COSMIC electronic medical record system has been written, and is implemented in the education environment for physicians. King's College London (KCL) has a suite of solutions under the name CogStack. This an Open Source, Enterprise Grade Informatics Platform for Genomic Medical Centres to streamline Recruitment, Business Intelligence, Audit and Research. KConnect is part of this suite supporting semantic search. KCL has installed KConnect in the following hospitals: South London and Maudsley NHS Foundation Trust (SLaM), King’s College Hospital (KCH), and University College London Hospital (UCLH).

Extensive consultation with potential clients to determine the value proposition of KConnect services has been done. Extensive dissemination and communication of project results took place. The highlights included project booths at the VITALIS in Gothenburg in April 2017; the HIMSS 2017 in Orlando in November 2017; the WoHIT 2016 in Barcelona in November 2016, and the EHI Live 2016 in Birmingham in November 2016.
Progress beyond the state-of-the-art took place in a number of areas. For semantic annotation, the main advances were new methods for treating data in medical records, in particular temporal data. The toolkits for straightforward adaptation to new languages for semantic annotation and machine translation are also a step beyond the state-of-the-art by reducing the time needed to adapt the tools to new languages. The use of annotation to enhance classification and search log analysis in the medical domain also provided useful new results. Advances were also made in the automated extraction of information from papers describing the results of Randomized Controlled Trials (RCTs) - tools for automatically extracting the Population, Intervention, Comparison, and Sample Size as well as the sentiment of the conclusion will allow rapid overviews of RCTs to be made.

Through the KConnect cloud market, we expect to be able to encourage many companies to adopt the KConnect technologies, which should lead to the planned high impact of KConnect.