Skip to main content
Aller à la page d’accueil de la Commission européenne (s’ouvre dans une nouvelle fenêtre)
français français
CORDIS - Résultats de la recherche de l’UE
CORDIS

Scalable Knowledge-Aware Image Caption Generation

Periodic Reporting for period 1 - ROCAP (Scalable Knowledge-Aware Image Caption Generation)

Période du rapport: 2023-01-01 au 2024-06-30

Describing graphical materials in words is sometimes crucial for understanding what we see. In the medical field, texts accompanying medical procedures often refer to images, and provide their definitive interpretation. In current practice, the analysis of medical images is often supported by AI tools that process images into “heat maps” that highlight points of interest. However, current technologies do not yet enable these automated tools to fully replace a careful medical examination of images by a human expert. Expert reports often emphasize findings that were not automatically highlighted in the heat map, and conversely, findings that appeared important to the AI system may be medically more nuanced or even irrelevant.
The aim of the ROCAP Proof-of-Concept project was to better understand how human experts can collaborate with powerful AI tools to align texts and images. Building on knowledge that had been gained in the ROCKY Advanced Grant project, the PoC ROCAP project created a tool that aligns textual reports of medical experts with the corresponding CT scans and their automatically generated heat maps. The alignment of textual and visual information shows medical experts how they can better integrate available AI tools into the diagnosis process, and forms a basis for further development in this area.
The project deliverables include three components: (i) a core system that aligns medical texts with the corresponding CT scans; (ii) a software tool that visualizes the results of this alignment to detect inconsistencies; and (iii) a testing tool that generates artificial data, which are used for monitoring system stability without using sensitive patient information.
The core system aligns the medical texts with the heat maps of the corresponding CT scans. The software tool integrates the core alignment module with additional visualization component. This tool is designed for use by medical experts, with reliability as a critical aspect. To ensure reliability, artificial data is used to monitor the stability of the tool’s functionality, allowing continuous evaluation of the process.
The main achievement of these three components is the development of a workflow in which experts can directly refer to medically relevant findings in a complex CT scan, facilitating the detection of inconsistencies between generic medical AI tools and medical reports.
To study the generalizability of the developed method to other medical domains, the core component was implemented in two different ways – one using a rule-based system, and the other using a Large Language Model (LLM). While the rule-based module showed higher quality and speed, the LLM module was developed much faster. To ensure cost-effectiveness, we see two potential paths. One is a general system that uses an LLM for various medical tasks of text-image alignment, but this would require close monitoring. The other direction is to develop domain-specific modules like the module in this work, which aim to achieve maximal alignment accuracy. Further market studies should explore which direction is more promising, and, if the second option is chosen, identify medical areas with maximal cost-effectiveness. In both implementation approaches, the inclusion of Natural Language Processing experts as part of the medical software team is essential, as NLP plays a critical role in the alignment process. Since NLP expertise partly resides in the social sciences and humanities, it is necessary to involve these disciplines in the development of medical software that incorporates natural language analysis.
the ROCAP system
Mon livret 0 0