Learning How To Robustly Estimate Camera Pose in Endoscopic Videos
- URL: http://arxiv.org/abs/2304.08023v1
- Date: Mon, 17 Apr 2023 07:05:01 GMT
- Title: Learning How To Robustly Estimate Camera Pose in Endoscopic Videos
- Authors: Michel Hayoz, Christopher Hahne, Mathias Gallardo, Daniel Candinas,
Thomas Kurmann, Maximilian Allan, Raphael Sznitman
- Abstract summary: We propose a solution for stereo endoscopes that estimates depth and optical flow to minimize two geometric losses for camera pose estimation.
Most importantly, we introduce two learned adaptive per-pixel weight mappings that balance contributions according to the input image content.
We validate our approach on the publicly available SCARED dataset and introduce a new in-vivo dataset, StereoMIS.
- Score: 5.073761189475753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Purpose: Surgical scene understanding plays a critical role in the technology
stack of tomorrow's intervention-assisting systems in endoscopic surgeries. For
this, tracking the endoscope pose is a key component, but remains challenging
due to illumination conditions, deforming tissues and the breathing motion of
organs. Method: We propose a solution for stereo endoscopes that estimates
depth and optical flow to minimize two geometric losses for camera pose
estimation. Most importantly, we introduce two learned adaptive per-pixel
weight mappings that balance contributions according to the input image
content. To do so, we train a Deep Declarative Network to take advantage of the
expressiveness of deep-learning and the robustness of a novel geometric-based
optimization approach. We validate our approach on the publicly available
SCARED dataset and introduce a new in-vivo dataset, StereoMIS, which includes a
wider spectrum of typically observed surgical settings. Results: Our method
outperforms state-of-the-art methods on average and more importantly, in
difficult scenarios where tissue deformations and breathing motion are visible.
We observed that our proposed weight mappings attenuate the contribution of
pixels on ambiguous regions of the images, such as deforming tissues.
Conclusion: We demonstrate the effectiveness of our solution to robustly
estimate the camera pose in challenging endoscopic surgical scenes. Our
contributions can be used to improve related tasks like simultaneous
localization and mapping (SLAM) or 3D reconstruction, therefore advancing
surgical scene understanding in minimally-invasive surgery.
Related papers
- EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera [12.152362025172915]
We propose Endoscopic Depth Any Camera (EndoDAC) to adapt foundation models to endoscopic scenes.
Specifically, we develop the Dynamic Vector-Based Low-Rank Adaptation (DV-LoRA) and employ Convolutional Neck blocks.
Our framework is capable of being trained solely on monocular surgical videos from any camera, ensuring minimal training costs.
arXiv Detail & Related papers (2024-05-14T14:55:15Z) - High-fidelity Endoscopic Image Synthesis by Utilizing Depth-guided Neural Surfaces [18.948630080040576]
We introduce a novel method for colon section reconstruction by leveraging NeuS applied to endoscopic images, supplemented by a single frame of depth map.
Our approach demonstrates exceptional accuracy in completely rendering colon sections, even capturing unseen portions of the surface.
This breakthrough opens avenues for achieving stable and consistently scaled reconstructions, promising enhanced quality in cancer screening procedures and treatment interventions.
arXiv Detail & Related papers (2024-04-20T18:06:26Z) - EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries using Gaussian Splatting [53.38166294158047]
EndoGSLAM is an efficient approach for endoscopic surgeries, which integrates streamlined representation and differentiable Gaussianization.
Experiments show that EndoGSLAM achieves a better trade-off between intraoperative availability and reconstruction quality than traditional or neural SLAM approaches.
arXiv Detail & Related papers (2024-03-22T11:27:43Z) - FLex: Joint Pose and Dynamic Radiance Fields Optimization for Stereo Endoscopic Videos [79.50191812646125]
Reconstruction of endoscopic scenes is an important asset for various medical applications, from post-surgery analysis to educational training.
We adress the challenging setup of a moving endoscope within a highly dynamic environment of deforming tissue.
We propose an implicit scene separation into multiple overlapping 4D neural radiance fields (NeRFs) and a progressive optimization scheme jointly optimizing for reconstruction and camera poses from scratch.
This improves the ease-of-use and allows to scale reconstruction capabilities in time to process surgical videos of 5,000 frames and more; an improvement of more than ten times compared to the state of the art while being agnostic to external tracking information
arXiv Detail & Related papers (2024-03-18T19:13:02Z) - Action Recognition in Video Recordings from Gynecologic Laparoscopy [4.002010889177872]
Action recognition is a prerequisite for many applications in laparoscopic video analysis.
In this study, we design and evaluate a CNN-RNN architecture as well as a customized training-inference framework.
arXiv Detail & Related papers (2023-11-30T16:15:46Z) - Neural LerPlane Representations for Fast 4D Reconstruction of Deformable
Tissues [52.886545681833596]
LerPlane is a novel method for fast and accurate reconstruction of surgical scenes under a single-viewpoint setting.
LerPlane treats surgical procedures as 4D volumes and factorizes them into explicit 2D planes of static and dynamic fields.
LerPlane shares static fields, significantly reducing the workload of dynamic tissue modeling.
arXiv Detail & Related papers (2023-05-31T14:38:35Z) - Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose
Estimation of Surgical Instruments [66.74633676595889]
We present a multi-camera capture setup consisting of static and head-mounted cameras.
Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre.
Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments.
arXiv Detail & Related papers (2023-05-05T13:42:19Z) - Live image-based neurosurgical guidance and roadmap generation using
unsupervised embedding [53.992124594124896]
We present a method for live image-only guidance leveraging a large data set of annotated neurosurgical videos.
A generated roadmap encodes the common anatomical paths taken in surgeries in the training set.
We trained and evaluated the proposed method with a data set of 166 transsphenoidal adenomectomy procedures.
arXiv Detail & Related papers (2023-03-31T12:52:24Z) - A Temporal Learning Approach to Inpainting Endoscopic Specularities and
Its effect on Image Correspondence [13.25903945009516]
We propose using a temporal generative adversarial network (GAN) to inpaint the hidden anatomy under specularities.
This is achieved using in-vivo data of gastric endoscopy (Hyper-Kvasir) in a fully unsupervised manner.
We also assess the effect of our method in computer vision tasks that underpin 3D reconstruction and camera motion estimation.
arXiv Detail & Related papers (2022-03-31T13:14:00Z) - E-DSSR: Efficient Dynamic Surgical Scene Reconstruction with
Transformer-based Stereoscopic Depth Perception [15.927060244702686]
We present an efficient reconstruction pipeline for highly dynamic surgical scenes that runs at 28 fps.
Specifically, we design a transformer-based stereoscopic depth perception for efficient depth estimation.
We evaluate the proposed pipeline on two datasets, the public Hamlyn Centre Endoscopic Video dataset and our in-house DaVinci robotic surgery dataset.
arXiv Detail & Related papers (2021-07-01T05:57:41Z) - A parameter refinement method for Ptychography based on Deep Learning
concepts [55.41644538483948]
coarse parametrisation in propagation distance, position errors and partial coherence frequently menaces the experiment viability.
A modern Deep Learning framework is used to correct autonomously the setup incoherences, thus improving the quality of a ptychography reconstruction.
We tested our system on both synthetic datasets and also on real data acquired at the TwinMic beamline of the Elettra synchrotron facility.
arXiv Detail & Related papers (2021-05-18T10:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.