SPRMamba: Surgical Phase Recognition for Endoscopic Submucosal Dissection with Mamba
- URL: http://arxiv.org/abs/2409.12108v1
- Date: Wed, 18 Sep 2024 16:26:56 GMT
- Title: SPRMamba: Surgical Phase Recognition for Endoscopic Submucosal Dissection with Mamba
- Authors: Xiangning Zhang, Jinnan Chen, Qingwei Zhang, Chengfeng Zhou, Zhengjie Zhang, Xiaobo Li, Dahong Qian,
- Abstract summary: We propose SPRMamba, a novel Mamba-based framework for ESD surgical phase recognition.
We show that SPRMamba surpasses existing state-of-the-art methods and exhibits greater robustness across various surgical phase recognition tasks.
- Score: 4.37495931705689
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Endoscopic Submucosal Dissection (ESD) is a minimally invasive procedure initially designed for the treatment of early gastric cancer but is now widely used for various gastrointestinal lesions. Computer-assisted Surgery systems have played a crucial role in improving the precision and safety of ESD procedures, however, their effectiveness is limited by the accurate recognition of surgical phases. The intricate nature of ESD, with different lesion characteristics and tissue structures, presents challenges for real-time surgical phase recognition algorithms. Existing surgical phase recognition algorithms struggle to efficiently capture temporal contexts in video-based scenarios, leading to insufficient performance. To address these issues, we propose SPRMamba, a novel Mamba-based framework for ESD surgical phase recognition. SPRMamba leverages the strengths of Mamba for long-term temporal modeling while introducing the Scaled Residual TranMamba block to enhance the capture of fine-grained details, overcoming the limitations of traditional temporal models like Temporal Convolutional Networks and Transformers. Moreover, a Temporal Sample Strategy is introduced to accelerate the processing, which is essential for real-time phase recognition in clinical settings. Extensive testing on the ESD385 dataset and the cholecystectomy Cholec80 dataset demonstrates that SPRMamba surpasses existing state-of-the-art methods and exhibits greater robustness across various surgical phase recognition tasks.
Related papers
- Deep intra-operative illumination calibration of hyperspectral cameras [73.08443963791343]
Hyperspectral imaging (HSI) is emerging as a promising novel imaging modality with various potential surgical applications.
We show that dynamically changing lighting conditions in the operating room dramatically affect the performance of HSI applications.
We propose a novel learning-based approach to automatically recalibrating hyperspectral images during surgery.
arXiv Detail & Related papers (2024-09-11T08:30:03Z) - SR-Mamba: Effective Surgical Phase Recognition with State Space Model [42.766718651973726]
SR-Mamba is a novel attention-free model specifically tailored to meet the challenges of surgical phase recognition.
In SR-Mamba, we leverage a bidirectional Mamba decoder to effectively model the temporal context in overlong sequences.
SR-Mamba establishes a new benchmark in surgical video analysis by demonstrating state-of-the-art performance on the Cholec80 and CATARACTS Challenge datasets.
arXiv Detail & Related papers (2024-07-11T09:34:31Z) - Friends Across Time: Multi-Scale Action Segmentation Transformer for
Surgical Phase Recognition [2.10407185597278]
We propose the Multi-Scale Action Transformer (MS-AST) for offline surgical phase recognition and the Multi-Scale Action Causal Transformer (MS-ASCT) for online surgical phase recognition.
Our method can achieve 95.26% and 96.15% accuracy on the Cholec80 dataset for online and offline surgical phase recognition, respectively.
arXiv Detail & Related papers (2024-01-22T01:34:03Z) - Action Recognition in Video Recordings from Gynecologic Laparoscopy [4.002010889177872]
Action recognition is a prerequisite for many applications in laparoscopic video analysis.
In this study, we design and evaluate a CNN-RNN architecture as well as a customized training-inference framework.
arXiv Detail & Related papers (2023-11-30T16:15:46Z) - Phase-Specific Augmented Reality Guidance for Microscopic Cataract
Surgery Using Long-Short Spatiotemporal Aggregation Transformer [14.568834378003707]
Phaemulsification cataract surgery (PCS) is a routine procedure using a surgical microscope.
PCS guidance systems extract valuable information from surgical microscopic videos to enhance proficiency.
Existing PCS guidance systems suffer from non-phasespecific guidance, leading to redundant visual information.
We propose a novel phase-specific augmented reality (AR) guidance system, which offers tailored AR information corresponding to the recognized surgical phase.
arXiv Detail & Related papers (2023-09-11T02:56:56Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - Learning-Based Keypoint Registration for Fetoscopic Mosaicking [65.02392513942533]
In Twin-to-Twin Transfusion Syndrome (TTTS), abnormal vascular anastomoses in the monochorionic placenta can produce uneven blood flow between the two fetuses.
We propose a learning-based framework for in-vivo fetoscopy frame registration for field-of-view expansion.
arXiv Detail & Related papers (2022-07-26T21:21:12Z) - A Long Short-term Memory Based Recurrent Neural Network for
Interventional MRI Reconstruction [50.1787181309337]
We propose a convolutional long short-term memory (Conv-LSTM) based recurrent neural network (RNN), or ConvLR, to reconstruct interventional images with golden-angle radial sampling.
The proposed algorithm has the potential to achieve real-time i-MRI for DBS and can be used for general purpose MR-guided intervention.
arXiv Detail & Related papers (2022-03-28T14:03:45Z) - Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid
Embedding Aggregation Transformer [57.18185972461453]
We introduce for the first time in surgical workflow analysis Transformer to reconsider the ignored complementary effects of spatial and temporal features for accurate phase recognition.
Our framework is lightweight and processes the hybrid embeddings in parallel to achieve a high inference speed.
arXiv Detail & Related papers (2021-03-17T15:12:55Z) - TeCNO: Surgical Phase Recognition with Multi-Stage Temporal
Convolutional Networks [43.95869213955351]
We propose a Multi-Stage Temporal Convolutional Network (MS-TCN) that performs hierarchical prediction refinement for surgical phase recognition.
Our method is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos with and without the use of additional surgical tool information.
arXiv Detail & Related papers (2020-03-24T10:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.