Endoscopes traversing endoluminal cavities, such as the colon, are routine in diagnostic and therapeutic interventions. However, they lack any autonomy. An endoscope operating autonomously in vivo would require real-time cartography of the regions where it is navigating and its location within the map. EndoMapper will develop the fundamentals for real-time localization and mapping inside the human body, using only the video stream supplied by a standard monocular endoscope. Nowadays, there are mature methods for out of the body visual mapping (known as VSLAM, Visual Simultaneous Localization And Mapping). They can deal with images coming from rather different domains such as cars, drones, or wearable devices. However, they perform poorly in gastrointestinal (GI) tract imagery, where non-rigid deformation with poor visual texture is prevalent.
This would complement any automated disease detection framework developed to support clinical decision making, accurate treatment delivery and effective screening regimes. In the short term, EndoMapper will bring to endoscopy live augmented reality, for example, to show to the surgeon the exact location of a tumour that was detected in diagnostic medical imaging, or to provide navigation instructions to reach the exact location where to perform a biopsy. In the longer term, deformable intracorporeal mapping and localization will become the basis for novel medical procedures that could include robotized autonomous interaction with the live tissue in minimally invasive surgery, or automated drug delivery with millimetre accuracy. Unlike other sensing technologies, like electromagnetic (EM) tracking, vision-based mapping will map the entire endoluminal environment in addition to the pose of the scope.
After five years of research and experience with real colonoscopy sequences, we have developed non-rigid and quasi rigid VSLAM fundamentals and methods to produce short term maps, along with multi-mapping, visual localization and topological mapping approaches that enable full colon maps. Although not foreseen in the initial proposal, we have also developed methods that exploit the near-light source and the inverse-square law of illumination decay because they provide rich 3D cues, “darker means farther”, and isophotes provide information about normals. Regarding the techniques conceived for VSLAM we have fulfilled the goal of VSLAM optimization approaches for endoscopy both for geometric sparse features and for photometric optimization. We have also fulfilled the goal of data driven approaches in which automated or self supervision is mandatory. The results include dense depth estimation, discrete feature matching, segmentation and inpainting.