Skip to main content

EndoMapper: Real-time mapping from endoscopic video

Periodic Reporting for period 1 - EndoMapper (EndoMapper: Real-time mapping from endoscopic video)

Reporting period: 2019-12-01 to 2020-11-30

Endoscopes traversing endoluminal cavities, such as the colon, are routine in diagnostic and therapeutic interventions. However, they lack any autonomy. An endoscope operating autonomously in vivo would require, real-time cartography of the regions where it is navigating and its location within the map. EndoMapper will develop the fundamentals for real-time localization and mapping inside the human body, using only the video stream supplied by a standard monocular endoscope. Nowadays, there are mature methods for out of the body visual mapping (known as VSLAM, Visual Simultaneous Localization And Mapping). They can deal with images coming from rather different domains such as cars, drones, or wearable devices. However, if they perform poorly in gastrointestinal (GI) tract imagery, where non-rigid deformation with poor visual texture is prevalent.
This would complement any automated disease detection framework developed to support clinical decision making, accurate treatment delivery and effective screening regimes. In the short term, EndoMapper will bring to endoscopy live augmented reality, for example, to show to the surgeon the exact location of a tumour that was detected in diagnostic medical imaging, or to provide navigation instructions to reach the exact location where to perform a biopsy. In the longer term, deformable intracorporeal mapping and localization will become the basis for novel medical procedures that could include robotized autonomous interaction with the live tissue in minimally invasive surgery, or automated drug delivery with millimetre accuracy. Unlike other sensing technologies, like electromagnetic (EM) tracking, vision-based mapping will map the entire endoluminal environment in addition to the pose of the scope.

Our objective is to research the fundamentals of non-rigid geometry and redesign the VSLAM methods to achieve, for the first time, mapping from GI endoscopies. We plan to accumulate high definition recordings of GI tract to learn from them. We consider different VSLAM approaches, depending on the role of their associated learning methods. Firstly, we will build a fully handcrafted Endomapper approach based on existing state-of-the-art VSLAM pipelines. Overcoming the non-rigidity challenge will be achieved by the new non-rigid mathematical models. Secondly, we will explore how to improve Endomapper using machine learning techniques.

* JMM Montiel, license CC BY-SA 4.0. Figure generated merging next figures: [1] By Cancer Research UK - Original email from CRUK CC BY-SA 4.0 Link ; [2] By MAC 06 - Own work CC BY 4.0 Link ; [3] By melvil - Own work CC BY-SA 4.0 Link ; [4] By Joachim Guntau (=J.Guntau) - CC BY-SA 3.0 Link ; [5] By Joachim Guntau (=J.Guntau) - CC BY-SA 3.0 Link
The project has been launched successfully. All the critical objectives for the first reporting have been achieved.

The database of real gastroscopies and colonoscopies recorded in HD has been successfully started. Crucially, there are sequences where the geometrical calibration of the endoscope is available.The number of videos in the database is increased weakly.

The goal of extending the classical state-of-the-art VSLAM pipelines has also been achieved. It has conceived, and implemented DefSLAM, the first VSLAM operating in deformable scenes, it has been published in IEEE Transactions on Robotics, a top robotics journal. It has been also developed the first method ever to exploit the tubular topology prior in non-rigid structure-from-motion (NRSfM).

We have started the research in machine learning by exploring which are the more promising venues. On the one hand it has been researched the learning methods for the matching of features under the challenging illumination and texture of endoscopy. On the other hand, the learning methods for estimating the 3D geometry from monocular images, as the first step to deal with deformation.
It is expected as a result the cross fertilization of VSLAM methods with weakly-supervised and unsupervised machine learning to yield a new generation of VSLAM methods operating in monocular endoscopy.
Scientific impact is expected because we focus on a specific case with great potential of generalization. The new method very likely might be extended to non-medical domains. Within the medical arena we focus on GI tract, but the results can be extended to other anatomical regions. We focus on the simplest monocular camera case, but the results can be easily generalized to multiple cameras, cameras and additional sensors as IMU.
The technical impact can be significant. Monocular endoscopes will be transformed into perception devices for autonomous operation and underpin new MIS AR and autonomous robotic systems.The availability of pose and body surface deformation in real-time will be available intra-operatively, making possible procedures currently unrealisable.
From an economical and social point of view, the results can push further the European excellence in delivering quality health care at a competitive cost. TIC and robotics open an opportunity to produce medical devices that incorporate the accumulated medical know-how. This will lower the cost of universal health services, and also open business opportunities in medical technology. Europe is a global leader in medical instrumentation and robotics, hence EndoMapper will open opportunities for European level research and product development. From a social point of view, it will boost personalized and robotized medicine able to produce better health care at a lower cost, for the benefit of all citizens.
Fig1. EndoMapper Vision*