Robust Real-Time Endoscopic Stereo Matching under Fuzzy Tissue Boundaries
- URL: http://arxiv.org/abs/2503.00731v3
- Date: Wed, 15 Oct 2025 01:27:34 GMT
- Title: Robust Real-Time Endoscopic Stereo Matching under Fuzzy Tissue Boundaries
- Authors: Yang Ding, Can Han, Sijia Du, Yaqi Wang, Dahong Qian,
- Abstract summary: Real-time acquisition of accurate scene depth is essential for automated robotic minimally invasive surgery.<n>Existing stereo matching methods, designed primarily for natural images, often struggle with endoscopic images due to fuzzy tissue boundaries.<n>We propose textbfRRESM, a real-time stereo matching method tailored for endoscopic images.
- Score: 8.217543444539652
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time acquisition of accurate scene depth is essential for automated robotic minimally invasive surgery. Stereo matching with binocular endoscopy can provide this depth information. However, existing stereo matching methods, designed primarily for natural images, often struggle with endoscopic images due to fuzzy tissue boundaries and typically fail to meet real-time requirements for high-resolution endoscopic image inputs. To address these challenges, we propose \textbf{RRESM}, a real-time stereo matching method tailored for endoscopic images. Our approach integrates a 3D Mamba Coordinate Attention module that enhances cost aggregation through position-sensitive attention maps and long-range spatial dependency modeling via the Mamba block, generating a robust cost volume without substantial computational overhead. Additionally, we introduce a High-Frequency Disparity Optimization module that refines disparity predictions near tissue boundaries by amplifying high-frequency details in the wavelet domain. Evaluations on the SCARED and SERV-CT datasets demonstrate state-of-the-art matching accuracy with a real-time inference speed of 42 FPS. The code is available at https://github.com/Sonne-Ding/RRESM.
Related papers
- EndoStreamDepth: Temporally Consistent Monocular Depth Estimation for Endoscopic Video Streams [6.300100115696222]
This work presents EndoStreamDepth, a monocular depth estimation framework for endoscopic video streams.<n>It provides accurate depth maps with sharp anatomical boundaries for each frame, temporally consistent predictions across frames, and real-time throughput.
arXiv Detail & Related papers (2025-12-20T00:53:30Z) - A Plug-and-Play Framework for Volumetric Light-Sheet Image Reconstruction [7.8016751308289685]
Traditional optical imaging is inadequate for capturing dynamic cellular structure in the beating heart.<n>We propose a high-performance imaging framework that integrates Compress Sensing with Light-Sheet Microscopy.<n>The proposed method successfully reconstructs cellular structures with excellent denoising performance and image clarity.
arXiv Detail & Related papers (2025-11-05T00:49:00Z) - Accelerating 3D Photoacoustic Computed Tomography with End-to-End Physics-Aware Neural Operators [74.65171736966131]
Photoacoustic computed tomography (PACT) combines optical contrast with ultrasonic resolution, achieving deep-tissue imaging beyond the optical diffusion limit.<n>Current implementations require dense transducer arrays and prolonged acquisition times, limiting clinical translation.<n>We introduce Pano, an end-to-end physics-aware model that directly learns the inverse acoustic mapping from sensor measurements to volumetric reconstructions.
arXiv Detail & Related papers (2025-09-11T23:12:55Z) - Unifying Scale-Aware Depth Prediction and Perceptual Priors for Monocular Endoscope Pose Estimation and Tissue Reconstruction [3.251946340142663]
A unified framework for monocular endoscopic tissue reconstruction is presented.<n>It integrates scale-aware depth prediction with temporally-constrained perceptual refinement.<n> Evaluations on HEVD and SCARED, with ablation and comparative analyses, demonstrate the framework's robustness and superiority over state-of-the-art methods.
arXiv Detail & Related papers (2025-08-15T07:41:17Z) - EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised Training [0.7499722271664147]
A novel framework with multistep efficient finetuning is proposed in this work.<n>Based on parameter-efficient finetuning on the foundation model, the proposed method achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-06-19T04:31:59Z) - A Diffusion-Driven Temporal Super-Resolution and Spatial Consistency Enhancement Framework for 4D MRI imaging [9.016385222343715]
In medical imaging, 4D MRI enables dynamic 3D visualization, yet the trade-off between spatial and temporal resolution requires prolonged scan time.<n>Traditional approaches typically rely on registration-based to generate intermediate frames.<n>We propose TSSC-Net, a novel framework that generates intermediate frames while preserving spatial consistency.
arXiv Detail & Related papers (2025-06-04T16:09:19Z) - DERD-Net: Learning Depth from Event-based Ray Densities [11.309936820480111]
Event cameras offer a promising avenue for multi-view stereo depth estimation and SLAM.
We propose a scalable, flexible and adaptable framework for pixel-wise depth estimation with event cameras in both monocular and stereo setups.
arXiv Detail & Related papers (2025-04-22T12:58:05Z) - Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model [62.37493746544967]
Camera-based setups offer a cost-effective option by using stereo depth estimation to generate dense, high-resolution depth maps.
Existing omnidirectional stereo matching approaches achieve only limited depth accuracy across diverse environments.
We present DFI-OmniStereo, a novel omnidirectional stereo matching method that leverages a large-scale pre-trained foundation model for relative monocular depth estimation.
arXiv Detail & Related papers (2025-03-30T16:24:22Z) - Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping [12.027762278121052]
Endo-2DTAM is a real-time endoscopic SLAM system with 2D Gaussian Splatting (2DGS)<n>Our robust tracking module combines point-to-point and point-to-plane distance metrics.<n>Our mapping module utilizes normal consistency and depth distortion to enhance surface reconstruction quality.
arXiv Detail & Related papers (2025-01-31T17:15:34Z) - Deep Sylvester Posterior Inference for Adaptive Compressed Sensing in Ultrasound Imaging [16.553626039240903]
Minimizing the number of required scan-lines can significantly enhance frame rate, field of view, energy efficiency, and data transfer speeds.<n>We introduce an adaptive subsampling method that maximizes intrinsic information gain in-situ.
arXiv Detail & Related papers (2025-01-07T14:37:14Z) - Stereo-Depth Fusion through Virtual Pattern Projection [37.519762078762575]
This paper presents a novel general-purpose stereo and depth data fusion paradigm.
It mimics the active stereo principle by replacing the unreliable physical pattern projector with a depth sensor.
It works by projecting virtual patterns consistent with the scene geometry onto the left and right images acquired by a conventional stereo camera.
arXiv Detail & Related papers (2024-06-06T17:59:58Z) - MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements [59.70107451308687]
We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM.
Our method, MM3DGS, addresses the limitations of prior rendering by enabling faster scale awareness, and improved trajectory tracking.
We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit.
arXiv Detail & Related papers (2024-04-01T04:57:41Z) - FLex: Joint Pose and Dynamic Radiance Fields Optimization for Stereo Endoscopic Videos [79.50191812646125]
Reconstruction of endoscopic scenes is an important asset for various medical applications, from post-surgery analysis to educational training.
We adress the challenging setup of a moving endoscope within a highly dynamic environment of deforming tissue.
We propose an implicit scene separation into multiple overlapping 4D neural radiance fields (NeRFs) and a progressive optimization scheme jointly optimizing for reconstruction and camera poses from scratch.
This improves the ease-of-use and allows to scale reconstruction capabilities in time to process surgical videos of 5,000 frames and more; an improvement of more than ten times compared to the state of the art while being agnostic to external tracking information
arXiv Detail & Related papers (2024-03-18T19:13:02Z) - AiAReSeg: Catheter Detection and Segmentation in Interventional
Ultrasound using Transformers [75.20925220246689]
endovascular surgeries are performed using the golden standard of Fluoroscopy, which uses ionising radiation to visualise catheters and vasculature.
This work proposes a solution using an adaptation of a state-of-the-art machine learning transformer architecture to detect and segment catheters in axial interventional Ultrasound image sequences.
arXiv Detail & Related papers (2023-09-25T19:34:12Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - Self-Supervised Depth Estimation in Laparoscopic Image using 3D
Geometric Consistency [7.902636435901286]
We present M3Depth, a self-supervised depth estimator to leverage 3D geometric structural information hidden in stereo pairs.
Our method outperforms previous self-supervised approaches on both a public dataset and a newly acquired dataset by a large margin.
arXiv Detail & Related papers (2022-08-17T17:03:48Z) - Deep Learning for Ultrasound Beamforming [120.12255978513912]
Beamforming, the process of mapping received ultrasound echoes to the spatial image domain, lies at the heart of the ultrasound image formation chain.
Modern ultrasound imaging leans heavily on innovations in powerful digital receive channel processing.
Deep learning methods can play a compelling role in the digital beamforming pipeline.
arXiv Detail & Related papers (2021-09-23T15:15:21Z) - A Real-Time Online Learning Framework for Joint 3D Reconstruction and
Semantic Segmentation of Indoor Scenes [87.74952229507096]
This paper presents a real-time online vision framework to jointly recover an indoor scene's 3D structure and semantic label.
Given noisy depth maps, a camera trajectory, and 2D semantic labels at train time, the proposed neural network learns to fuse the depth over frames with suitable semantic labels in the scene space.
arXiv Detail & Related papers (2021-08-11T14:29:01Z) - A parameter refinement method for Ptychography based on Deep Learning
concepts [55.41644538483948]
coarse parametrisation in propagation distance, position errors and partial coherence frequently menaces the experiment viability.
A modern Deep Learning framework is used to correct autonomously the setup incoherences, thus improving the quality of a ptychography reconstruction.
We tested our system on both synthetic datasets and also on real data acquired at the TwinMic beamline of the Elettra synchrotron facility.
arXiv Detail & Related papers (2021-05-18T10:15:17Z) - SERV-CT: A disparity dataset from CT for validation of endoscopic 3D
reconstruction [8.448866668577946]
We present a stereo-endoscopic reconstruction validation dataset based on CT (SERV-CT)
The SERV-CT dataset provides an easy to use stereoscopic validation for surgical applications with smooth reference disparities and depths with coverage over the majority of the endoscopic images.
arXiv Detail & Related papers (2020-12-22T01:28:30Z) - Weakly-supervised Learning For Catheter Segmentation in 3D Frustum
Ultrasound [74.22397862400177]
We propose a novel Frustum ultrasound based catheter segmentation method.
The proposed method achieved the state-of-the-art performance with an efficiency of 0.25 second per volume.
arXiv Detail & Related papers (2020-10-19T13:56:22Z) - 4D Spatio-Temporal Convolutional Networks for Object Position Estimation
in OCT Volumes [69.62333053044712]
3D convolutional neural networks (CNNs) have shown promising performance for pose estimation of a marker object using single OCT images.
We extend 3D CNNs to 4D-temporal CNNs to evaluate the impact of additional temporal information for marker object tracking.
arXiv Detail & Related papers (2020-07-02T12:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.