RGA-Net: A Vision Enhancement Framework for Robotic Surgical Systems Using Reciprocal Attention Mechanisms
- URL: http://arxiv.org/abs/2602.13726v1
- Date: Sat, 14 Feb 2026 11:31:54 GMT
- Title: RGA-Net: A Vision Enhancement Framework for Robotic Surgical Systems Using Reciprocal Attention Mechanisms
- Authors: Quanjun Li, Weixuan Li, Han Xia, Junhua Zhou, Chi-Man Pun, Xuhang Chen,
- Abstract summary: RGA-Net is a novel deep learning framework specifically designed for smoke removal in robotic surgery.<n>Our approach addresses the challenges of surgical smoke-including dense, non-homogeneous distribution and complex light scattering-through a hierarchical encoder-decoder architecture.
- Score: 25.435178288442597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robotic surgical systems rely heavily on high-quality visual feedback for precise teleoperation; yet, surgical smoke from energy-based devices significantly degrades endoscopic video feeds, compromising the human-robot interface and surgical outcomes. This paper presents RGA-Net (Reciprocal Gating and Attention-fusion Network), a novel deep learning framework specifically designed for smoke removal in robotic surgery workflows. Our approach addresses the unique challenges of surgical smoke-including dense, non-homogeneous distribution and complex light scattering-through a hierarchical encoder-decoder architecture featuring two key innovations: (1) a Dual-Stream Hybrid Attention (DHA) module that combines shifted window attention with frequency-domain processing to capture both local surgical details and global illumination changes, and (2) an Axis-Decomposed Attention (ADA) module that efficiently processes multi-scale features through factorized attention mechanisms. These components are connected via reciprocal cross-gating blocks that enable bidirectional feature modulation between encoder and decoder pathways. Extensive experiments on the DesmokeData and LSD3K surgical datasets demonstrate that RGA-Net achieves superior performance in restoring visual clarity suitable for robotic surgery integration. Our method enhances the surgeon-robot interface by providing consistently clear visualization, laying a technical foundation for alleviating surgeons' cognitive burden, optimizing operation workflows, and reducing iatrogenic injury risks in minimally invasive procedures. These practical benefits could be further validated through future clinical trials involving surgeon usability assessments. The proposed framework represents a significant step toward more reliable and safer robotic surgical systems through computational vision enhancement.
Related papers
- Evaluating Large Vision-language Models for Surgical Tool Detection [0.866627581195388]
We evaluate the effectiveness of large vision-language models for the fundamental surgical vision task of detecting surgical tools.<n>Qwen2.5 consistently achieves superior detection performance in both configurations among the evaluated VLMs.
arXiv Detail & Related papers (2026-01-23T17:00:46Z) - SurgiATM: A Physics-Guided Plug-and-Play Model for Deep Learning-Based Smoke Removal in Laparoscopic Surgery [16.71481757853012]
Smoke generated by tissue cauterization can significantly degrade the visual quality of endoscopic frames.<n>We propose the Surgical Atmospheric Model (SurgiATM) for surgical smoke removal.<n>SurgiATM statistically bridges a physics-based atmospheric model and data-driven deep learning models.
arXiv Detail & Related papers (2025-11-07T08:04:24Z) - Toward Reliable AR-Guided Surgical Navigation: Interactive Deformation Modeling with Data-Driven Biomechanics and Prompts [21.952265898720825]
We propose a data-driven algorithm that preserves FEM-level accuracy while improving computational efficiency.<n>We introduce a novel human-in-the-loop mechanism into the deformation modeling process.<n>Our algorithm achieves a mean target registration error of 3.42 mm, surpassing state-of-the-art methods in volumetric accuracy.
arXiv Detail & Related papers (2025-06-08T14:19:54Z) - Benchmarking Laparoscopic Surgical Image Restoration and Beyond [54.28852320829451]
In laparoscopic surgery, a clear and high-quality visual field is critical for surgeons to make accurate decisions.<n> persistent visual degradation, including smoke generated by energy devices, lens fogging from thermal gradients, and lens contamination pose risks to patient safety.<n>We introduce a real-world open-source surgical image restoration dataset covering laparoscopic environments, called SurgClean.
arXiv Detail & Related papers (2025-05-25T14:17:56Z) - Surgical Foundation Model Leveraging Compression and Entropy Maximization for Image-Guided Surgical Assistance [50.486523249499115]
Real-time video understanding is critical to guide procedures in minimally invasive surgery (MIS)<n>We propose Compress-to-Explore (C2E), a novel self-supervised framework to learn compact, informative representations from surgical videos.<n>C2E uses entropy-maximizing decoders to compress images while preserving clinically relevant details, improving encoder performance without labeled data.
arXiv Detail & Related papers (2025-05-16T14:02:24Z) - Landmark-Free Preoperative-to-Intraoperative Registration in Laparoscopic Liver Resection [50.388465935739376]
Liver registration by overlaying preoperative 3D models onto intraoperative 2D frames can assist surgeons in perceiving the spatial anatomy of the liver clearly for a higher surgical success rate.<n>Existing registration methods rely heavily on anatomical landmark-based, which encounter two major limitations.<n>We propose a landmark-free preoperative-to-intraoperative registration framework utilizing effective self-supervised learning.
arXiv Detail & Related papers (2025-04-21T14:55:57Z) - Surgical Temporal Action-aware Network with Sequence Regularization for
Phase Recognition [28.52533700429284]
We propose a Surgical Temporal Action-aware Network with sequence Regularization, named STAR-Net, to recognize surgical phases more accurately from input videos.
MS-STA module integrates visual features with spatial and temporal knowledge of surgical actions at the cost of 2D networks.
Our STAR-Net with MS-STA and DSR can exploit visual features of surgical actions with effective regularization, thereby leading to the superior performance of surgical phase recognition.
arXiv Detail & Related papers (2023-11-21T13:43:16Z) - fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for
Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training.
Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns.
Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z) - Robotic Navigation Autonomy for Subretinal Injection via Intelligent
Real-Time Virtual iOCT Volume Slicing [88.99939660183881]
We propose a framework for autonomous robotic navigation for subretinal injection.
Our method consists of an instrument pose estimation method, an online registration between the robotic and the i OCT system, and trajectory planning tailored for navigation to an injection target.
Our experiments on ex-vivo porcine eyes demonstrate the precision and repeatability of the method.
arXiv Detail & Related papers (2023-01-17T21:41:21Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z) - Spatiotemporal-Aware Augmented Reality: Redefining HCI in Image-Guided
Therapy [39.370739217840594]
Augmented reality (AR) has been introduced in the operating rooms in the last decade.
This paper shows how exemplary visualization are redefined by taking full advantage of head-mounted displays.
The awareness of the system from the geometric and physical characteristics of X-ray imaging allows the redefinition of different human-machine interfaces.
arXiv Detail & Related papers (2020-03-04T18:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.