Skip to main content

Developing Natural Language Processing and machine learning algorithms for the most comprehensive knowledge base to speed up diagnosis of rare disease patients

Periodic Reporting for period 1 - Mendel KBASE (Developing Natural Language Processing and machine learning algorithms for the most comprehensive knowledge base to speed up diagnosis of rare disease patients)

Reporting period: 2019-11-15 to 2020-11-14

- Context of rare disease diagnosis, why is Mendelian working on them, what improvements can technology bring to existing rare disease diagnostic.
- Benefits society in the form of improved care for the patients and saved costs and time for the healthcare providers.
- Overarching goal of the project to use ML/NLP methods to accelerate and automate work done by the clinical team to be able to scale the tool from the few diseases that were done as part of Mendelian MVP to a more significant portion of known rare diseases. Main area of improvement identified by the IA as the criteria encoding (from choice of disease to add to actual highlighting of patients)
Onboarding and survey of the work processes of Mendelian, literature survey on applications of Machine Learning to Medicine in general and Electronic Health Records in particular, design of a research plan and training plan,

Semantic Code Finding tool – 2 results
1st - The semantic code finding tool development was motivated by the observation that encoding all the CTV3 concept codes necessary for a clinical finding is made difficult by the loose structure of the CTV3 ontology and by the variety of concepts that can be related to a single finding.
Before the development of the semantic code finding tool, the only options to look for concept codes were using exact text search in the search engine or exploring the tree-like structure of the CTV3 ontology for already found codes (direct parent/child relationships). This situation placed the burden on the encoder to figure out the exact wording used for the code descriptions and, in the case of generic codes, to be very exhaustive in their search to avoid missing isolated codes. It also made the encoder responsible for thinking about every concept that could possibly be related to the finding (treatments, symptoms, alternative ways to interpret an observation, etc.).
The goal of the semantic code finding tool is to alleviate the difficulty of finding new codes by providing additional code suggestions based on semantics to enable the encoder to discover new concepts that would be difficult to find with the pre-existing tools, thus saving them time and energy, and to create more thorough encodings thus discovering more potential patients to highlight.
2nd – With regards to the research project, the semantic code finding tool also enables us to validate from a clinical perspective the relevance of the semantic representation of the CTV3 concepts enabling us to use this representation in future work done on EHR data.
The state of the art for the identified work to be done focuses on predicting common diseases to monitor using EHR (TODO refs), the key differences here are first that Mendelian is interested in rare diseases so from a ML perspective we are dealing with rare events (<10%), extremely rare events even (<<1%), and second that from a clinical perspective we are not trying to predict the onset of the diseases, only to detect them. Expected result is a semi-automated processing pipeline to identify and encode symptoms and observations that lead to a rare disease diagnostic. This pipeline would still require validation from a clinical expert before it can be used to send reports and highlight patients but it would greatly accelerate the existing process. In terms of wider project impact, by current estimates, this tool would significantly accelerate the typical length from first symptoms to the diagnostic of rare diseases by several years. From the patient perspective, reducing the length of the diagnostic process would create a significantly less stressful experience as well as potentially preventing complications from their untreated condition. From the healthcare providers perspective, an earlier diagnostic means less unnecessary exams, tests and referrals which translates to less costs and more time available with their other patients.
Screenshot of results for different diseases set