Periodic Reporting for period 1 - Mendel KBASE (Developing Natural Language Processing and machine learning algorithms for the most comprehensive knowledge base to speed up diagnosis of rare disease patients)
Reporting period: 2019-11-15 to 2020-11-14
- Benefits society in the form of improved care for the patients and saved costs and time for the healthcare providers.
- Overarching goal of the project to use ML/NLP methods to accelerate and automate work done by the clinical team to be able to scale the tool from the few diseases that were done as part of Mendelian MVP to a more significant portion of known rare diseases. Main area of improvement identified by the IA as the criteria encoding (from choice of disease to add to actual highlighting of patients)
Semantic Code Finding tool – 2 results
1st - The semantic code finding tool development was motivated by the observation that encoding all the CTV3 concept codes necessary for a clinical finding is made difficult by the loose structure of the CTV3 ontology and by the variety of concepts that can be related to a single finding.
Before the development of the semantic code finding tool, the only options to look for concept codes were using exact text search in the search engine or exploring the tree-like structure of the CTV3 ontology for already found codes (direct parent/child relationships). This situation placed the burden on the encoder to figure out the exact wording used for the code descriptions and, in the case of generic codes, to be very exhaustive in their search to avoid missing isolated codes. It also made the encoder responsible for thinking about every concept that could possibly be related to the finding (treatments, symptoms, alternative ways to interpret an observation, etc.).
The goal of the semantic code finding tool is to alleviate the difficulty of finding new codes by providing additional code suggestions based on semantics to enable the encoder to discover new concepts that would be difficult to find with the pre-existing tools, thus saving them time and energy, and to create more thorough encodings thus discovering more potential patients to highlight.
2nd – With regards to the research project, the semantic code finding tool also enables us to validate from a clinical perspective the relevance of the semantic representation of the CTV3 concepts enabling us to use this representation in future work done on EHR data.