Related papers: EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised Training

EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised Training

URL: http://arxiv.org/abs/2506.16017v1
Date: Thu, 19 Jun 2025 04:31:59 GMT
Title: EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised Training
Authors: Liangjing Shao, Linxin Bai, Chenkang Du, Xinrong Chen,
Abstract summary: A novel framework with multistep efficient finetuning is proposed in this work.<n>Based on parameter-efficient finetuning on the foundation model, the proposed method achieves state-of-the-art performance.
Score: 0.7499722271664147
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Monocular depth estimation and ego-motion estimation are significant tasks for scene perception and navigation in stable, accurate and efficient robot-assisted endoscopy. To tackle lighting variations and sparse textures in endoscopic scenes, multiple techniques including optical flow, appearance flow and intrinsic image decomposition have been introduced into the existing methods. However, the effective training strategy for multiple modules are still critical to deal with both illumination issues and information interference for self-supervised depth estimation in endoscopy. Therefore, a novel framework with multistep efficient finetuning is proposed in this work. In each epoch of end-to-end training, the process is divided into three steps, including optical flow registration, multiscale image decomposition and multiple transformation alignments. At each step, only the related networks are trained without interference of irrelevant information. Based on parameter-efficient finetuning on the foundation model, the proposed method achieves state-of-the-art performance on self-supervised depth estimation on SCARED dataset and zero-shot depth estimation on Hamlyn dataset, with 4\%$\sim$10\% lower error. The evaluation code of this work has been published on https://github.com/BaymaxShao/EndoMUST.

Related papers

Occlusion-Aware Self-Supervised Monocular Depth Estimation for Weak-Texture Endoscopic Images [1.1084686909647639]
We propose a self-supervised monocular depth estimation network tailored for endoscopic scenes.<n>Existing methods, though accurate, typically assume consistent illumination.<n>These variations lead to incorrect geometric interpretations and unreliable self-supervised signals.
arXiv Detail & Related papers (2025-04-24T14:12:57Z)
Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy [2.906891207990726]
We introduce a novel fine-tuning strategy for the Depth Anything Model.<n>We integrate it with an intrinsic-based unsupervised monocular depth estimation framework.<n>Our results show that our method achieves state-of-the-art performance while minimizing the number of trainable parameters.
arXiv Detail & Related papers (2024-09-12T03:04:43Z)
EndoOmni: Zero-Shot Cross-Dataset Depth Estimation in Endoscopy by Robust Self-Learning from Noisy Labels [4.99086145037811]
Single-image depth estimation is essential for endoscopy tasks such as localization, reconstruction, and augmented reality.<n>Most existing methods in surgical scenes focus on in-domain depth estimation, limiting their real-world applicability.<n>We present Endo Omni, the first foundation model for zero-shot cross-domain depth estimation for endoscopy.
arXiv Detail & Related papers (2024-09-09T08:46:45Z)
Self-STORM: Deep Unrolled Self-Supervised Learning for Super-Resolution Microscopy [55.2480439325792]
We introduce deep unrolled self-supervised learning, which alleviates the need for such data by training a sequence-specific, model-based autoencoder. Our proposed method exceeds the performance of its supervised counterparts.
arXiv Detail & Related papers (2024-03-25T17:40:32Z)
EndoDepthL: Lightweight Endoscopic Monocular Depth Estimation with CNN-Transformer [0.0]
We propose a novel lightweight solution named EndoDepthL that integrates CNN and Transformers to predict multi-scale depth maps. Our approach includes optimizing the network architecture, incorporating multi-scale dilated convolution, and a multi-channel attention mechanism. To better evaluate the performance of monocular depth estimation in endoscopic imaging, we propose a novel complexity evaluation metric.
arXiv Detail & Related papers (2023-08-04T21:38:29Z)
The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation [42.48819460873482]
Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity. We show that they also excel in estimating optical flow and monocular depth, surprisingly, without task-specific architectures and loss functions.
arXiv Detail & Related papers (2023-06-02T21:26:20Z)
SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes. It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions. We introduce an external pretrained monocular depth estimation model for generating single-image depth prior. Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z)
Towards Domain-agnostic Depth Completion [28.25756709062647]
Existing depth completion methods are often targeted at a specific sparse depth type and generalize poorly across task domains. We present a method to complete sparse/semi-dense, noisy, and potentially low-resolution depth maps obtained by various range sensors. Our method shows superior cross-domain generalization ability against state-of-the-art depth completion methods.
arXiv Detail & Related papers (2022-07-29T04:10:22Z)
Learnable Patchmatch and Self-Teaching for Multi-Frame Depth Estimation in Monocular Endoscopy [16.233423010425355]
We propose a novel unsupervised multi-frame monocular depth estimation model.<n>The proposed model integrates a learnable patchmatch module to adaptively increase the discriminative ability in regions with low and homogeneous textures.<n>As a byproduct of the self-teaching paradigm, the proposed model is able to improve the depth predictions when more frames are input at test time.
arXiv Detail & Related papers (2022-05-30T12:11:03Z)
DeepRM: Deep Recurrent Matching for 6D Pose Refinement [77.34726150561087]
DeepRM is a novel recurrent network architecture for 6D pose refinement. The architecture incorporates LSTM units to propagate information through each refinement step. DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
arXiv Detail & Related papers (2022-05-28T16:18:08Z)
Adversarial Domain Feature Adaptation for Bronchoscopic Depth Estimation [111.89519571205778]
In this work, we propose an alternative domain-adaptive approach to depth estimation. Our novel two-step structure first trains a depth estimation network with labeled synthetic images in a supervised manner. The results of our experiments show that the proposed method improves the network's performance on real images by a considerable margin.
arXiv Detail & Related papers (2021-09-24T08:11:34Z)
An Adaptive Framework for Learning Unsupervised Depth Completion [59.17364202590475]
We present a method to infer a dense depth map from a color image and associated sparse depth measurements. We show that regularization and co-visibility are related via the fitness of the model to data and can be unified into a single framework.
arXiv Detail & Related papers (2021-06-06T02:27:55Z)
A parameter refinement method for Ptychography based on Deep Learning concepts [55.41644538483948]
coarse parametrisation in propagation distance, position errors and partial coherence frequently menaces the experiment viability. A modern Deep Learning framework is used to correct autonomously the setup incoherences, thus improving the quality of a ptychography reconstruction. We tested our system on both synthetic datasets and also on real data acquired at the TwinMic beamline of the Elettra synchrotron facility.
arXiv Detail & Related papers (2021-05-18T10:15:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.