Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Generative Latent Priors
- URL: http://arxiv.org/abs/2411.17790v2
- Date: Mon, 09 Dec 2024 14:14:49 GMT
- Title: Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Generative Latent Priors
- Authors: Ziang Xu, Bin Li, Yang Hu, Chenyu Zhang, James East, Sharib Ali, Jens Rittscher,
- Abstract summary: 3D mapping in endoscopy enables quantitative, holistic lesion characterization within the gastrointestinal (GI) tract.
Existing methods relying on synthetic datasets or complex models often lack generalizability in challenging endoscopic conditions.
We propose a robust self-supervised monocular depth and pose estimation framework that incorporates a Generative Latent Bank and a Variational Autoencoder.
- Score: 10.61978045582697
- License:
- Abstract: Accurate 3D mapping in endoscopy enables quantitative, holistic lesion characterization within the gastrointestinal (GI) tract, requiring reliable depth and pose estimation. However, endoscopy systems are monocular, and existing methods relying on synthetic datasets or complex models often lack generalizability in challenging endoscopic conditions. We propose a robust self-supervised monocular depth and pose estimation framework that incorporates a Generative Latent Bank and a Variational Autoencoder (VAE). The Generative Latent Bank leverages extensive depth scenes from natural images to condition the depth network, enhancing realism and robustness of depth predictions through latent feature priors. For pose estimation, we reformulate it within a VAE framework, treating pose transitions as latent variables to regularize scale, stabilize z-axis prominence, and improve x-y sensitivity. This dual refinement pipeline enables accurate depth and pose predictions, effectively addressing the GI tract's complex textures and lighting. Extensive evaluations on SimCol and EndoSLAM datasets confirm our framework's superior performance over published self-supervised methods in endoscopic depth and pose estimation.
Related papers
- PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.
Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - EndoDepth: A Benchmark for Assessing Robustness in Endoscopic Depth Prediction [1.7243216387069678]
We present the EndoDepth benchmark, an evaluation framework designed to assess the robustness of monocular depth prediction models in endoscopic scenarios.
We present an evaluation approach that is consistent and specifically designed to evaluate the robustness performance of the model in endoscopic scenarios.
arXiv Detail & Related papers (2024-09-30T04:18:14Z) - Benchmarking Robustness of Endoscopic Depth Estimation with Synthetically Corrupted Data [6.963196918624006]
We present a benchmark for assessing the robustness of endoscopic depth estimation models.
We introduce the Depth Estimation Robustness Score (DERS), a novel metric that combines measures of error, accuracy, and robustness.
A thorough analysis of two monocular depth estimation models using our framework reveals crucial information about their reliability under adverse conditions.
arXiv Detail & Related papers (2024-09-24T13:04:54Z) - Generalizing monocular colonoscopy image depth estimation by uncertainty-based global and local fusion network [3.4419856649092746]
This study aims to develop a robust framework that generalizes well to real colonoscopy images.
We propose a framework combining a convolutional neural network (CNN) for capturing local features and a Transformer for capturing global information.
arXiv Detail & Related papers (2024-09-23T13:30:59Z) - Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy [3.1186464715409983]
We introduce a novel fine-tuning strategy for the Depth Anything Model.
We integrate it with an intrinsic-based unsupervised monocular depth estimation framework.
Our results on the SCARED dataset show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-09-12T03:04:43Z) - ToDER: Towards Colonoscopy Depth Estimation and Reconstruction with Geometry Constraint Adaptation [67.22294293695255]
We propose a novel reconstruction pipeline with a bi-directional adaptation architecture named ToDER to get precise depth estimations.
Experimental results demonstrate that our approach can precisely predict depth maps in both realistic and synthetic colonoscopy videos.
arXiv Detail & Related papers (2024-07-23T14:24:26Z) - Adversarial Domain Feature Adaptation for Bronchoscopic Depth Estimation [111.89519571205778]
In this work, we propose an alternative domain-adaptive approach to depth estimation.
Our novel two-step structure first trains a depth estimation network with labeled synthetic images in a supervised manner.
The results of our experiments show that the proposed method improves the network's performance on real images by a considerable margin.
arXiv Detail & Related papers (2021-09-24T08:11:34Z) - On the Sins of Image Synthesis Loss for Self-supervised Depth Estimation [60.780823530087446]
We show that improvements in image synthesis do not necessitate improvement in depth estimation.
We attribute this diverging phenomenon to aleatoric uncertainties, which originate from data.
This observed divergence has not been previously reported or studied in depth.
arXiv Detail & Related papers (2021-09-13T17:57:24Z) - Adaptive confidence thresholding for monocular depth estimation [83.06265443599521]
We propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching methods.
The confidence map of the pseudo ground truth depth map is estimated to mitigate performance degeneration by inaccurate pseudo depth maps.
Experimental results demonstrate superior performance to state-of-the-art monocular depth estimation methods.
arXiv Detail & Related papers (2020-09-27T13:26:16Z) - Calibrating Self-supervised Monocular Depth Estimation [77.77696851397539]
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal.
We show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.
arXiv Detail & Related papers (2020-09-16T14:35:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.