Skip to main content
European Commission logo print header

Probabilistic modelling of electronic health records

Article Category

Article available in the following languages:

Analysis of electronic health records for personalised medicine

Electronic health records (EHRs) store massive amounts of medical information with the overall aim of improving health systems. European researchers explored novel machine learning techniques to interpret EHRs and identify risk factors for disease.

Digital Economy icon Digital Economy
Health icon Health

Researchers are increasingly relying on machine learning methods to decipher complex patterns of diseases, study drug interactions, and form predictions. However, current methods do not support the analysis of heterogeneous data, nor the integration of massive sets of data such as EHRs.

Employing probabilistic machine learning techniques

Undertaken with the support of the Marie Skłodowska-Curie (MSC) programme, the PMOHR project addressed this challenge by developing interpretable models capable of analysing EHRs. “We employed probabilistic machine learning techniques which are increasingly used to analyse real-world data in many areas of science,” explains MSC fellow Francisco Rodríguez Ruiz. In probabilistic machine-learning, assumptions about the data structure are encoded in a model with hidden patterns. Using an inference algorithm, the model learns these patterns and explores data sets to make predictions. The MSC fellow generated a new class of models, known as exponential family embeddings (EFEs), that can capture co-occurrence patterns in a data set. Essentially, this means that EFEs can unveil meaningful features of the diagnoses as well as hidden elements such as medical conditions, medical terms or biological parameters that co-occur in a given data set. In an unsupervised manner, EFEs analyse how these features and medical diagnoses relate to one another.

Putting models to the test

The PMOHR models are based on fast inference algorithms and can thus handle different types of data at a quicker pace. At the same time, results are easily interpretable by experts in the field, allowing models to be refined if findings don’t make sense. The models are scalable to handle large sets of data, and can thus be used for the statistical analysis of EHRs. PMOHR researchers have applied the tools on publicly available EHR data as well as data from the New York Presbyterian Hospital. EFEs applied on medical conditions and clinical text from the freely accessible MIMIC-III database identified clusters of similar diseases based solely on their co-occurrence patterns. Model clustering revealed information beyond the mere classification of the conditions, unveiling non-trivial risk factors and guiding the future analysis of hidden features.

Project significance and future prospects

PMOHR has advanced the state of the art in probabilistic modelling by developing tools with the power to analyse complex sets of heterogeneous data. An important advantage of probabilistic modelling techniques is the ability to measure the uncertainty of predictions. “When it comes to predicting risk factors for disease, a measure of the uncertainty is critical,” emphasises Rodríguez Ruiz. The long-term goal of PMOHR is to implement the probabilistic models to improve healthcare systems through the design of personalised medicine and clinical support systems. Not only will this contribute to better health, but it will also cut down on healthcare costs. At the same time, it has the power to uncover previously unknown patterns from the data and even lead to new causal theories. “Future plans involve model advancement to determine causality and the effect of medical treatments or drugs,” concludes Rodríguez Ruiz.

Keywords

PMOHR, probabilistic machine learning, electronic health records, EHRs, exponential family embeddings, EFEs, model, inference algorithms, probabilistic modelling

Discover other articles in the same domain of application