Skip to main content

Perception Ultrasound by Learning Sonographic Experience

Periodic Reporting for period 2 - PULSE (Perception Ultrasound by Learning Sonographic Experience)

Reporting period: 2018-05-01 to 2019-10-31

"Perception Ultrasound by Learning Sonographic Experience (PULSE) explores how the latest ideas from machine learning (deep learning) can be combined with ""big data"" from clinical sonography to develop new methods and understanding that will inform the development of a next generation of ultrasound imaging capabilities that make medical ultrasound more accessible to non-expert clinical professionals. A principal novel element (to our knowledge, unique in the world) is to record expert sonographer gaze and probe movements while they scan and this means ""artificial intelligence"" models are built based on, not only the recorded ultrasound video, but as well human perceptual information (this is the human knowledge that goes into the model building process). Not described in the original proposal, but as part of the on-going project, the sonographer has spoken aloud as they scan for a subset of acquisitions, so we can use audio as an additional cue for sonography description models. As of the mid-term report we have over 600 full length clinical scans recorded with a purpose built acquisition system.

The original ambition in PULSE is to develop new machine-learning approaches to describe ultrasound video content richly using knowledge of scanning protocols, visual cues determined by gaze-tracking and probe motion. To our knowledge this is the first work to attempt to bridge the gap between an ultrasound device and the user by employing a machine-learning solution that embeds clinical expert knowledge (through measuring perception and actions) to add interpretation power. In a very recent publication we have shown how our approach naturally leads to efficient ultrasound models (meaning models that have significantly less parameters than conventional deep learning models with a small percentage loss in average accuracy) which is an important deployabilty advantage as you ideally want solutions to work on low cost standard consumer (small memory) devices rather than high-performance devices. The hope is that taking this approach may provide a major step towards making ultrasound a more accessible technology to the non-expert across the world.

The innovation in PULSE is to apply the latest ideas from machine learning and computer vision to build, from real world training video data, computational models that describe how an expert sonographer performs a diagnostic study of a subject from multiple perceptual cues. Novel machine-learning based computational model designs are being investigated for different tasks (to date - recognising standard planes, gazed-based image and video navigation, describing sonographer actions, describing ultrasound video via text, and summarising and characterising clinical workflow) based on probe and eye motion tracking, audio, image processing, and knowledge of how to interpret real-world clinical images and videos acquired to a standardised protocol. The underlying premise of our research is that by building models that more closely mimic how a human makes decisions from US images, considerably more efficient and powerful assistive interpretation methods can be built than have previously been possible from still US images and videos alone.

The overall objectives of the technical research is:
1. To develop a rich lexicon of sonographer words (vocabularies and languages) to describe US videos, the annotated datasets, and methods and software for accurately and reliably describing real world clinical ultrasound video content.
2. To build methods and software for describing ultrasound video content both for sonographer training and assistive technologies for clinical tasks.
3. To compare automatic description by using combined US video and probe motion information, and video, probe and eye motion information relative to US video alone.

Software demonstrators will be developed and evaluated on real world obstetric US data in collaboration with clinical experts and trainees to demonstrate the new approach and its potential to move routine US scanning services from hospitals
into the community which would have clear economic, healthcare and social benefits across Europe and beyond.

Progress against these objectives is summarised in other sections.
The principal work conducted in this period is summarised as follows:

1, We have developed a custom-built dedicated ultrasound-based system for simultaneously acquiring full-scan ultrasound video, gaze tracking data, and probe motion data. The system is based in an a hospital clinic and capturing data on pregnant women coming for screening scans (first, second of third trimester) and the sonographers who perform the scans. The dataset is unique in the world to our knowledge and being used to both study clinical sonography from a data science perspective for the first time as well as enable technical research on building assistive tools for clinical sonography tasks which are informed by sonographer perceptions and actions.

2. We have developed a framework for manual annotation of full length ultrasound video scans to generate a rich annotated database for machine learning based research. Manual video annotation is now an on-going process.

3. We have developed methods to automatically analyse full length video to look at clinical sonography questions related to sonographer bias in biometry measurement, and to quantify and describe adherence to bioeffect safety indices during clinical sonography. Our findings have been accepted for publication/oral presentation at a top clinical meeting.

4. We have been studying spatio-temporal descriptors for ultrasound video interpretation;

4. We have developed and published (either in print, or accepted for publication) automated technical solutions (machine-learning based models) for a number of clinical tasks : standard plane detection using video and gaze information; interpretation guidance using gaze prediction on ultrasound standard planes, and memory efficient models for standard plane detection.

5. We have characterised the language of obstetrics sonography from an analysis of audio recordings of sonographers taken during scanning. We have used this to develop a method for generating a text-based description of ultrasound video from audio and video. This work will be published at a leading international conference in late 2019 and we believe is the first published attempt at this video and natural language task.

6. We have developed a semi-supervised method for labelling anatomy in full ultrasound scans. This has been used to characterise clinical workflow patterns in full video on a large number of real world datasets for the first time (published in April 2019). We are now looking at how to build richer descriptions of clinical tasks, workflow and sonographer skill and how to use the derived representations to characterize differences between sonographer skills (skills assessment).

7. On-going work is looking at how we can describe sonographer scanning gestures, and use probe motion, gaze and ultrasound video to build machine-learning based sonography navigation models.
Current state of the art methods do not use eye movements to inform decision making in sonography. The underlying premise of PULSE is that using eye tracking and probe motion information to inform image recognition algorithm design we can build more useful machine-learning solutions for automatic US video description that more closely mimic human interpretation/actions than models based on video alone.

The PULSE custom-built system allows us to capture information about key perceptual cues – eye movement and probe motion - lost to conventional image-based interpretation algorithms which only have the video stream of images to work with.
Using this we can study the visual search strategies employed by expert and novice sonographers. The eye movement datasets are being added to the PULSE database for algorithm development research.
We are also interested in questions such as whether novices and experts follow different visual search strategies, and whether there are different visual search strategies amongst experts.

Knowledge gleaned from these studies is being used to propose new visual search models (defining key structures used in visual search and visual search patterns) that will be embedded in assistive technologies to support sonography navigition and image reading/interpretation.

We are only half way through the project, so the expected results at project end are still a little difficult to predict. However, we expect them to be in the areas of:

1. Clinical sonography data science - greater understanding of clinical sonography workflow and sonographer skills/skills assessment.

2. Assistive technologies for interpreting ultrasound images - New machine-learning based models to assist in ultrasound standard plane image interpretation.

3. Assistive technologies for ultrasound navigation - New machine-learning based models to assist in ultrasound navigation for simple and complex tasks.

4. Video analysis - natural processing language: methodology to allow key information from hard to interpret ultrasound video to be communicated to a non-sonographer via a text-based description.