SR-Mamba: Effective Surgical Phase Recognition with State Space Model
- URL: http://arxiv.org/abs/2407.08333v1
- Date: Thu, 11 Jul 2024 09:34:31 GMT
- Title: SR-Mamba: Effective Surgical Phase Recognition with State Space Model
- Authors: Rui Cao, Jiangliu Wang, Yun-Hui Liu,
- Abstract summary: SR-Mamba is a novel attention-free model specifically tailored to meet the challenges of surgical phase recognition.
In SR-Mamba, we leverage a bidirectional Mamba decoder to effectively model the temporal context in overlong sequences.
SR-Mamba establishes a new benchmark in surgical video analysis by demonstrating state-of-the-art performance on the Cholec80 and CATARACTS Challenge datasets.
- Score: 42.766718651973726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Surgical phase recognition is crucial for enhancing the efficiency and safety of computer-assisted interventions. One of the fundamental challenges involves modeling the long-distance temporal relationships present in surgical videos. Inspired by the recent success of Mamba, a state space model with linear scalability in sequence length, this paper presents SR-Mamba, a novel attention-free model specifically tailored to meet the challenges of surgical phase recognition. In SR-Mamba, we leverage a bidirectional Mamba decoder to effectively model the temporal context in overlong sequences. Moreover, the efficient optimization of the proposed Mamba decoder facilitates single-step neural network training, eliminating the need for separate training steps as in previous works. This single-step training approach not only simplifies the training process but also ensures higher accuracy, even with a lighter spatial feature extractor. Our SR-Mamba establishes a new benchmark in surgical video analysis by demonstrating state-of-the-art performance on the Cholec80 and CATARACTS Challenge datasets. The code is accessible at https://github.com/rcao-hk/SR-Mamba.
Related papers
- Mamba-CL: Optimizing Selective State Space Model in Null Space for Continual Learning [54.19222454702032]
Continual Learning aims to equip AI models with the ability to learn a sequence of tasks over time, without forgetting previously learned knowledge.
State Space Models (SSMs) have achieved notable success in computer vision.
We introduce Mamba-CL, a framework that continuously fine-tunes the core SSMs of the large-scale Mamba foundation model.
arXiv Detail & Related papers (2024-11-23T06:36:16Z) - MLLA-UNet: Mamba-like Linear Attention in an Efficient U-Shape Model for Medical Image Segmentation [6.578088710294546]
Traditional segmentation methods struggle to address challenges such as high anatomical variability, blurred tissue boundaries, low organ contrast, and noise.
We propose MLLA-UNet (Mamba-Like Linear Attention UNet), a novel architecture that achieves linear computational complexity while maintaining high segmentation accuracy.
Experiments demonstrate that MLLA-UNet achieves state-of-the-art performance on six challenging datasets with 24 different segmentation tasks, including but not limited to FLARE22, AMOS CT, and ACDC, with an average DSC of 88.32%.
arXiv Detail & Related papers (2024-10-31T08:54:23Z) - EM-Net: Efficient Channel and Frequency Learning with Mamba for 3D Medical Image Segmentation [3.6813810514531085]
We introduce a novel 3D medical image segmentation model called EM-Net. Inspired by its success, we introduce a novel Mamba-based 3D medical image segmentation model called EM-Net.
Comprehensive experiments on two challenging multi-organ datasets with other state-of-the-art (SOTA) algorithms show that our method exhibits better segmentation accuracy while requiring nearly half the parameter size of SOTA models and 2x faster training speed.
arXiv Detail & Related papers (2024-09-26T09:34:33Z) - SPRMamba: Surgical Phase Recognition for Endoscopic Submucosal Dissection with Mamba [4.37495931705689]
We propose SPRMamba, a novel Mamba-based framework for ESD surgical phase recognition.
We show that SPRMamba surpasses existing state-of-the-art methods and exhibits greater robustness across various surgical phase recognition tasks.
arXiv Detail & Related papers (2024-09-18T16:26:56Z) - Bidirectional Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.
We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.
Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z) - MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking [51.28485682954006]
We propose a pure Mamba-based framework (MambaVT) to fully exploit intrinsic-temporal contextual modeling for robust visible-thermal tracking.
Specifically, we devise the long-range cross-frame integration component to globally adapt to target appearance variations.
Experiments show the significant potential of vision Mamba for RGB-T tracking, with MambaVT achieving state-of-the-art performance on four mainstream benchmarks.
arXiv Detail & Related papers (2024-08-15T02:29:00Z) - DeciMamba: Exploring the Length Extrapolation Potential of Mamba [89.07242846058023]
We introduce DeciMamba, a context-extension method specifically designed for Mamba.
We show that DeciMamba can extrapolate context lengths 25x longer than the ones seen during training, and does so without utilizing additional computational resources.
arXiv Detail & Related papers (2024-06-20T17:40:18Z) - HARMamba: Efficient and Lightweight Wearable Sensor Human Activity Recognition Based on Bidirectional Mamba [7.412537185607976]
Wearable sensor-based human activity recognition (HAR) is a critical research domain in activity perception.
This study introduces HARMamba, an innovative light-weight and versatile HAR architecture that combines selective bidirectional State Spaces Model and hardware-aware design.
HarMamba outperforms contemporary state-of-the-art frameworks, delivering comparable or better accuracy with significantly reducing computational and memory demands.
arXiv Detail & Related papers (2024-03-29T13:57:46Z) - Rotate to Scan: UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation [8.686237221268584]
We propose Triplet Mamba-UNet as a new type of image segmentation network.
Our model achieves a one-third reduction in parameters compared to the previous VM-UNet.
arXiv Detail & Related papers (2024-03-26T13:40:18Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.