Occlusion-Aware Self-Supervised Monocular Depth Estimation for Weak-Texture Endoscopic Images
- URL: http://arxiv.org/abs/2504.17582v1
- Date: Thu, 24 Apr 2025 14:12:57 GMT
- Title: Occlusion-Aware Self-Supervised Monocular Depth Estimation for Weak-Texture Endoscopic Images
- Authors: Zebo Huang, Yinghui Wang,
- Abstract summary: We propose a self-supervised monocular depth estimation network tailored for endoscopic scenes.<n>Existing methods, though accurate, typically assume consistent illumination.<n>These variations lead to incorrect geometric interpretations and unreliable self-supervised signals.
- Score: 1.1084686909647639
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a self-supervised monocular depth estimation network tailored for endoscopic scenes, aiming to infer depth within the gastrointestinal tract from monocular images. Existing methods, though accurate, typically assume consistent illumination, which is often violated due to dynamic lighting and occlusions caused by GI motility. These variations lead to incorrect geometric interpretations and unreliable self-supervised signals, degrading depth reconstruction quality. To address this, we introduce an occlusion-aware self-supervised framework. First, we incorporate an occlusion mask for data augmentation, generating pseudo-labels by simulating viewpoint-dependent occlusion scenarios. This enhances the model's ability to learn robust depth features under partial visibility. Second, we leverage semantic segmentation guided by non-negative matrix factorization, clustering convolutional activations to generate pseudo-labels in texture-deprived regions, thereby improving segmentation accuracy and mitigating information loss from lighting changes. Experimental results on the SCARED dataset show that our method achieves state-of-the-art performance in self-supervised depth estimation. Additionally, evaluations on the Endo-SLAM and SERV-CT datasets demonstrate strong generalization across diverse endoscopic environments.
Related papers
- Pseudo-Label Guided Real-World Image De-weathering: A Learning Framework with Imperfect Supervision [57.5699142476311]
We propose a unified solution for real-world image de-weathering with non-ideal supervision.<n>Our method exhibits significant advantages when trained on imperfectly aligned de-weathering datasets.
arXiv Detail & Related papers (2025-04-14T07:24:03Z) - Intrinsic Image Decomposition for Robust Self-supervised Monocular Depth Estimation on Reflective Surfaces [10.557788087220509]
Self-supervised monocular depth estimation (SSMDE) has gained attention in the field of deep learning.<n>We propose a novel framework that incorporates intrinsic image decomposition into SSMDE.<n>Our method synergistically trains for both monocular depth estimation and intrinsic image decomposition.
arXiv Detail & Related papers (2025-03-28T07:56:59Z) - Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Generative Latent Priors [10.61978045582697]
3D mapping in endoscopy enables quantitative, holistic lesion characterization within the gastrointestinal (GI) tract.<n>Existing methods relying on synthetic datasets or complex models often lack generalizability in challenging endoscopic conditions.<n>We propose a robust self-supervised monocular depth and pose estimation framework that incorporates a Generative Latent Bank and a Variational Autoencoder.
arXiv Detail & Related papers (2024-11-26T15:43:06Z) - Adaptive Stereo Depth Estimation with Multi-Spectral Images Across All Lighting Conditions [58.88917836512819]
We propose a novel framework incorporating stereo depth estimation to enforce accurate geometric constraints.
To mitigate the effects of poor lighting on stereo matching, we introduce Degradation Masking.
Our method achieves state-of-the-art (SOTA) performance on the Multi-Spectral Stereo (MS2) dataset.
arXiv Detail & Related papers (2024-11-06T03:30:46Z) - Self-supervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion [16.673178271652553]
Self-supervised monocular depth estimation has received widespread attention because of its capability to train without ground truth.
We employ the generative-based diffusion model with a unique denoising training process for self-supervised monocular depth estimation.
We conduct experiments on the KITTI and Make3D datasets.
arXiv Detail & Related papers (2024-06-14T07:31:20Z) - DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery [71.6345505427213]
DPMesh is an innovative framework for occluded human mesh recovery.
It capitalizes on the profound diffusion prior about object structure and spatial relationships embedded in a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-04-01T18:59:13Z) - AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation [51.143540967290114]
We propose a method that unlocks a wide range of previously-infeasible geometric augmentations for unsupervised depth computation and estimation.
This is achieved by reversing, or undo''-ing, geometric transformations to the coordinates of the output depth, warping the depth map back to the original reference frame.
arXiv Detail & Related papers (2023-10-15T05:15:45Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z) - Self-Supervised Monocular Depth and Ego-Motion Estimation in Endoscopy:
Appearance Flow to the Rescue [38.168759071532676]
Self-supervised learning technology has been applied to calculate depth and ego-motion from monocular videos.
In this work, we introduce a novel concept referred to as appearance flow to address the brightness inconsistency problem.
We build a unified self-supervised framework to estimate monocular depth and ego-motion simultaneously in endoscopic scenes.
arXiv Detail & Related papers (2021-12-15T13:51:10Z) - Self-Supervised Generative Adversarial Network for Depth Estimation in
Laparoscopic Images [13.996932179049978]
We propose SADepth, a new self-supervised depth estimation method based on Generative Adversarial Networks.
It consists of an encoder-decoder generator and a discriminator to incorporate geometry constraints during training.
Experiments on two public datasets show that SADepth outperforms recent state-of-the-art unsupervised methods by a large margin.
arXiv Detail & Related papers (2021-07-09T19:40:20Z) - Adaptive confidence thresholding for monocular depth estimation [83.06265443599521]
We propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching methods.
The confidence map of the pseudo ground truth depth map is estimated to mitigate performance degeneration by inaccurate pseudo depth maps.
Experimental results demonstrate superior performance to state-of-the-art monocular depth estimation methods.
arXiv Detail & Related papers (2020-09-27T13:26:16Z) - Calibrating Self-supervised Monocular Depth Estimation [77.77696851397539]
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal.
We show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.
arXiv Detail & Related papers (2020-09-16T14:35:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.