Skip to main content

EndoMapper: Real-time mapping from endoscopic video

Periodic Reporting for period 2 - EndoMapper (EndoMapper: Real-time mapping from endoscopic video)

Reporting period: 2020-12-01 to 2022-05-31

Endoscopes traversing endoluminal cavities, such as the colon, are routine in diagnostic and therapeutic interventions. However, they lack any autonomy. An endoscope operating autonomously in vivo would require real-time cartography of the regions where it is navigating and its location within the map. EndoMapper will develop the fundamentals for real-time localization and mapping inside the human body, using only the video stream supplied by a standard monocular endoscope. Nowadays, there are mature methods for out of the body visual mapping (known as VSLAM, Visual Simultaneous Localization And Mapping). They can deal with images coming from rather different domains such as cars, drones, or wearable devices. However, they perform poorly in gastrointestinal (GI) tract imagery, where non-rigid deformation with poor visual texture is prevalent.

This would complement any automated disease detection framework developed to support clinical decision making, accurate treatment delivery and effective screening regimes. In the short term, EndoMapper will bring to endoscopy live augmented reality, for example, to show to the surgeon the exact location of a tumour that was detected in diagnostic medical imaging, or to provide navigation instructions to reach the exact location where to perform a biopsy. In the longer term, deformable intracorporeal mapping and localization will become the basis for novel medical procedures that could include robotized autonomous interaction with the live tissue in minimally invasive surgery, or automated drug delivery with millimetre accuracy. Unlike other sensing technologies, like electromagnetic (EM) tracking, vision-based mapping will map the entire endoluminal environment in addition to the pose of the scope.

Our objective is to research the fundamentals of non-rigid geometry and redesign the VSLAM methods to achieve, for the first time, mapping from GI endoscopies. We plan to accumulate high definition recordings of GI tract to learn from them. We consider different VSLAM approaches, depending on the role of their associated learning methods. Firstly, we will build a fully handcrafted EndoMapper approach based on existing state-of-the-art VSLAM pipelines. Overcoming the non-rigidity challenge will be achieved by the new non-rigid mathematical models. Secondly, we will explore how to improve EndoMapper using machine learning techniques.
Most of our contributions are already published or at some point in the publication pipeline in top-tier conferences and journals. Additionally, we provide green access to these publications. The main results are:

We have produced the Endomapper dataset, the first collection of complete endoscopy sequences acquired during regular medical practice, including slow and careful screening explorations, making secondary use of medical data, to facilitate the development and evaluation of VSLAM (Visual Simultaneous Localization and Mapping) methods in real endoscopy data. The first release of the dataset is composed of 59 sequences with a total of more than 15 hours of video. It is also the first endoscopic dataset that includes both the computed geometric and photometric endoscope calibration as well as the original calibration videos. To allow the reproducibility of our research and to benefit the community we have advanced the dataset publication to May 2022, at

Regarding the extension of classical VSLAM pipelines to the endoscopy environment, we have developed DefSLAM and SD-DefSLAM, the first-ever monocular SLAM system operating under deformation. We have also developed photometric sequential methods able to produce maps of deformable surfaces with any topology and discontinuities. The typical colonoscopy sequences show slow and small deformations and plenty of discontinuities due to haustra. Additionally mapping, we have developed a place recognition method able to operate under significant scene changes typical of endoscopy.

We have also produced fundamental research on scene reconstruction in deformable scenes, developing and validating experimentally a novel method to reconstruct a non-rigid object that weakly resembles a 3D shape template introducing the novel concept of Weak Template (WT) to assist NRSfM. The WT can code cylindrical topology that weakly resembles the human colon.

Regarding data association, we have developed methods to cope with the disturbing specular reflections that plague the colonoscopies, semantic segmentation as high-level data association by guiding polyp semantic segmentation through point features. We have also novel active learning methods to reduce the number of training examples.

We have also developed machine learning methods on depth estimation able to deal with the challenging poor texture and specularities typical of colonoscopy. They combine self-supervised multiview photometric and transfer learning from synthetic data, performing significantly better than methods supervised by SfM or by classical SLAM pipelines. As a result, we are able to produce dense 3D local maps in those parts with small deformations, which are common in endoscopies. Additionally, we have developed methods to perform semantic scene understanding, namely tool segmentation and scene classification.

To better communicate the potential of EndoMapper three use cases that can benefit from the SLAM technology: polyp measurement, percentage of visualised mucosa and relocalization of a previously explored region.
It is expected as a result the cross-fertilization of VSLAM methods with weakly-supervised and unsupervised machine learning to yield a new generation of VSLAM methods operating in monocular endoscopy.

Scientific impact is expected because we focus on a specific case with great potential of generalization. The new method very likely might be extended to non-medical domains. Within the medical arena, we focus on the GI tract, but the results can be extended to other anatomical regions. We focus on the simplest monocular camera case, but the results can be easily generalized to multiple cameras, cameras and additional sensors such as IMU.

The technical impact can be significant. Monocular endoscopes will be transformed into perception devices for autonomous operation and underpin new MIS AR and autonomous robotic systems.The availability of pose and body surface deformation in real-time will be available intra-operatively, making possible procedures currently unrealisable.

From an economic and social point of view, the results can push further European excellence in delivering quality health care at a competitive cost. TIC and robotics open an opportunity to produce medical devices that incorporate the accumulated medical know-how. This will lower the cost of universal health services, and also open business opportunities in medical technology. Europe is a global leader in medical instrumentation and robotics, hence EndoMapper will open opportunities for European-level research and product development. From a social point of view, it will boost personalized and robotized medicine able to produce better health care at a lower cost, for the benefit of all citizens.