Related papers: Monocular Depth Guided Occlusion-Aware Disparity Refinement via Semi-supervised Learning in Laparoscopic Images

Monocular Depth Guided Occlusion-Aware Disparity Refinement via Semi-supervised Learning in Laparoscopic Images

URL: http://arxiv.org/abs/2505.08178v1
Date: Tue, 13 May 2025 02:29:56 GMT
Title: Monocular Depth Guided Occlusion-Aware Disparity Refinement via Semi-supervised Learning in Laparoscopic Images
Authors: Ziteng Liu, Dongdong He, Chenghong Zhang, Wenpeng Gao, Yili Fu,
Abstract summary: Occlusion and the scarcity of labeled surgical data are significant challenges in disparity estimation for stereo laparoscopic images.<n>To address these issues, this study proposes a Depth Guided Occlusion-Aware Disparity Refinement Network (DGORNet)<n>A Position Embedding (PE) module is introduced to provide explicit spatial context, enhancing the network's ability to localize and refine features.<n>Experiments on the SCARED dataset demonstrate that DGORNet outperforms state-of-the-art methods in terms of End-Point Error (EPE) and Root Mean Squared Error (RMSE)
Score: 7.765272785122932
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Occlusion and the scarcity of labeled surgical data are significant challenges in disparity estimation for stereo laparoscopic images. To address these issues, this study proposes a Depth Guided Occlusion-Aware Disparity Refinement Network (DGORNet), which refines disparity maps by leveraging monocular depth information unaffected by occlusion. A Position Embedding (PE) module is introduced to provide explicit spatial context, enhancing the network's ability to localize and refine features. Furthermore, we introduce an Optical Flow Difference Loss (OFDLoss) for unlabeled data, leveraging temporal continuity across video frames to improve robustness in dynamic surgical scenes. Experiments on the SCARED dataset demonstrate that DGORNet outperforms state-of-the-art methods in terms of End-Point Error (EPE) and Root Mean Squared Error (RMSE), particularly in occlusion and texture-less regions. Ablation studies confirm the contributions of the Position Embedding and Optical Flow Difference Loss, highlighting their roles in improving spatial and temporal consistency. These results underscore DGORNet's effectiveness in enhancing disparity estimation for laparoscopic surgery, offering a practical solution to challenges in disparity estimation and data limitations.

Related papers

High-Fidelity Functional Ultrasound Reconstruction via A Visual Auto-Regressive Framework [58.07923338080814]
Functional neurotemporal imaging provides exceptional resolution for mapping.<n>However, its practical application is hampered by critical challenges.<n>These include data scarcity, ethical considerations and signal degradation.
arXiv Detail & Related papers (2025-05-23T15:27:17Z)
Federated Learning with LoRA Optimized DeiT and Multiscale Patch Embedding for Secure Eye Disease Recognition [2.1358421658740214]
This study introduces a data-efficient image transformer (DeIT)-based approach to advance AI-powered medical imaging and disease detection.<n>It achieves state-of-the-art performance, with the highest AUC, F1 score, precision, minimal loss, and Top-5 accuracy.<n>Grad-CAM++ visualizations improve interpretability by highlighting critical pathological regions, enhancing the model's clinical relevance.
arXiv Detail & Related papers (2025-05-11T13:51:56Z)
Occlusion-Aware Self-Supervised Monocular Depth Estimation for Weak-Texture Endoscopic Images [1.1084686909647639]
We propose a self-supervised monocular depth estimation network tailored for endoscopic scenes.<n>Existing methods, though accurate, typically assume consistent illumination.<n>These variations lead to incorrect geometric interpretations and unreliable self-supervised signals.
arXiv Detail & Related papers (2025-04-24T14:12:57Z)
Comprehensive Evaluation of OCT-based Automated Segmentation of Retinal Layer, Fluid and Hyper-Reflective Foci: Impact on Diabetic Retinopathy Severity Assessment [0.0]
Diabetic retinopathy (DR) is a leading cause of vision loss, requiring early and accurate assessment to prevent irreversible damage.<n>This study proposes an active-learning-based deep learning pipeline for automated segmentation of retinal layers.
arXiv Detail & Related papers (2025-03-03T07:23:56Z)
Generalizable Non-Line-of-Sight Imaging with Learnable Physical Priors [52.195637608631955]
Non-line-of-sight (NLOS) imaging has attracted increasing attention due to its potential applications. Existing NLOS reconstruction approaches are constrained by the reliance on empirical physical priors. We introduce a novel learning-based solution, comprising two key designs: Learnable Path Compensation (LPC) and Adaptive Phasor Field (APF)
arXiv Detail & Related papers (2024-09-21T04:39:45Z)
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery [71.6345505427213]
DPMesh is an innovative framework for occluded human mesh recovery. It capitalizes on the profound diffusion prior about object structure and spatial relationships embedded in a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-04-01T18:59:13Z)
Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z)
DEFN: Dual-Encoder Fourier Group Harmonics Network for Three-Dimensional Indistinct-Boundary Object Segmentation [6.0920148653974255]
We introduce Defect Injection (SDi) to augment the representational diversity of challenging indistinct-boundary objects within training corpora. Consequently, we propose the Dual-Encoder Fourier Group Harmonics Network (DEFN) to tailor incorporating noise, amplify detailed feature recognition, and bolster representation across diverse medical imaging scenarios.
arXiv Detail & Related papers (2023-11-01T12:33:04Z)
ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic Diffusion Models [69.9178140563928]
Colonoscopy analysis is essential for assisting clinical diagnosis and treatment. The scarcity of annotated data limits the effectiveness and generalization of existing methods. We propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks.
arXiv Detail & Related papers (2023-09-03T07:55:46Z)
A Global and Patch-wise Contrastive Loss for Accurate Automated Exudate Detection [12.669734891001667]
Diabetic retinopathy (DR) is a leading global cause of blindness. Early detection of hard exudates plays a crucial role in identifying DR, which aids in treating diabetes and preventing vision loss. We present a novel supervised contrastive learning framework to optimize hard exudate segmentation.
arXiv Detail & Related papers (2023-02-22T17:39:00Z)
Fuzzy Attention Neural Network to Tackle Discontinuity in Airway Segmentation [67.19443246236048]
Airway segmentation is crucial for the examination, diagnosis, and prognosis of lung diseases. Some small-sized airway branches (e.g., bronchus and terminaloles) significantly aggravate the difficulty of automatic segmentation. This paper presents an efficient method for airway segmentation, comprising a novel fuzzy attention neural network and a comprehensive loss function.
arXiv Detail & Related papers (2022-09-05T16:38:13Z)
Unsupervised Instance Segmentation in Microscopy Images via Panoptic Domain Adaptation and Task Re-weighting [86.33696045574692]
We propose a Cycle Consistency Panoptic Domain Adaptive Mask R-CNN (CyC-PDAM) architecture for unsupervised nuclei segmentation in histopathology images. We first propose a nuclei inpainting mechanism to remove the auxiliary generated objects in the synthesized images. Secondly, a semantic branch with a domain discriminator is designed to achieve panoptic-level domain adaptation.
arXiv Detail & Related papers (2020-05-05T11:08:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.