MSF-Mamba: Motion-aware State Fusion Mamba for Efficient Micro-Gesture Recognition
- URL: http://arxiv.org/abs/2510.10478v2
- Date: Thu, 16 Oct 2025 04:48:32 GMT
- Title: MSF-Mamba: Motion-aware State Fusion Mamba for Efficient Micro-Gesture Recognition
- Authors: Deng Li, Jun Shao, Bohao Xing, Rong Gao, Bihan Wen, Heikki Kälviäinen, Xin Liu,
- Abstract summary: We propose motion-aware state fusion linear mamba (MSF-Mamba) for microture recognition.<n>MSF-Mamba enhances Mamba with localtemporal modeling by contextual local neighboring states.<n>Our design introduces a motion-aware state fusion module based on central frame difference (CFD)<n>Specifically, MSF-Mamba supports multiscale motion-aware state fusion, as well as an adaptive scale weighting module.
- Score: 42.21383693511854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Micro-gesture recognition (MGR) targets the identification of subtle and fine-grained human motions and requires accurate modeling of both long-range and local spatiotemporal dependencies. While CNNs are effective at capturing local patterns, they struggle with long-range dependencies due to their limited receptive fields. Transformer-based models address this limitation through self-attention mechanisms but suffer from high computational costs. Recently, Mamba has shown promise as an efficient model, leveraging state space models (SSMs) to enable linear-time processing However, directly applying the vanilla Mamba to MGR may not be optimal. This is because Mamba processes inputs as 1D sequences, with state updates relying solely on the previous state, and thus lacks the ability to model local spatiotemporal dependencies. In addition, previous methods lack a design of motion-awareness, which is crucial in MGR. To overcome these limitations, we propose motion-aware state fusion mamba (MSF-Mamba), which enhances Mamba with local spatiotemporal modeling by fusing local contextual neighboring states. Our design introduces a motion-aware state fusion module based on central frame difference (CFD). Furthermore, a multiscale version named MSF-Mamba+ has been proposed. Specifically, MSF-Mamba supports multiscale motion-aware state fusion, as well as an adaptive scale weighting module that dynamically weighs the fused states across different scales. These enhancements explicitly address the limitations of vanilla Mamba by enabling motion-aware local spatiotemporal modeling, allowing MSF-Mamba and MSF-Mamba to effectively capture subtle motion cues for MGR. Experiments on two public MGR datasets demonstrate that even the lightweight version, namely, MSF-Mamba, achieves SoTA performance, outperforming existing CNN-, Transformer-, and SSM-based models while maintaining high efficiency.
Related papers
- TSkel-Mamba: Temporal Dynamic Modeling via State Space Model for Human Skeleton-based Action Recognition [59.99922360648663]
TSkel-Mamba is a hybrid Transformer-Mamba framework that effectively captures both spatial and temporal dynamics.<n>The MTI module employs multi-scale Cycle operators to capture cross-channel temporal interactions, a critical factor in action recognition.
arXiv Detail & Related papers (2025-12-12T11:55:16Z) - Gather-Scatter Mamba: Accelerating Propagation with Efficient State Space Model [15.551773379039675]
State Space Models (SSMs) have historically played a central role in sequential modeling.<n>Recent advances in selective SSMs like Mamba offer a compelling alternative.<n>We propose a hybrid architecture that combines shifted window self-attention for spatial context aggregation with Mamba-based selective scanning for efficient temporal propagation.
arXiv Detail & Related papers (2025-10-01T13:11:13Z) - CSFMamba: Cross State Fusion Mamba Operator for Multimodal Remote Sensing Image Classification [12.959829835589453]
We propose Cross State Fusion Mamba (Camba) Network.<n>Specifically, we first design the preprocessing module of remote sensing image information for the needs of Mamba structure.<n> Secondly, a cross-state module based on Mamba operator is creatively designed to fully fuse the feature of the two modalities.
arXiv Detail & Related papers (2025-08-31T03:08:34Z) - Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection [88.47928738482719]
Linear State Space Models (SSMs) offer remarkable performance gains in sequence modeling.<n>Recent advances, such as Mamba, further enhance SSMs with input-dependent gating and hardware-aware implementations.<n>We introduce Routing Mamba (RoM), a novel approach that scales SSM parameters using sparse mixtures of linear projection experts.
arXiv Detail & Related papers (2025-06-22T19:26:55Z) - MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification [46.67137351665963]
Mamba-based models have recently demonstrated significant potential in hyperspectral image (HSI) classification.<n>We propose MambaMoE, a novel spectral-spatial Mixture-of-Experts (MoE) framework, which represents the first MoE-based approach in the HSI classification domain.<n>MambaMoE achieves state-of-the-art performance in both classification accuracy and computational efficiency compared to existing advanced methods.
arXiv Detail & Related papers (2025-04-29T07:50:36Z) - MobileMamba: Lightweight Multi-Receptive Visual Mamba Network [51.33486891724516]
Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs.
We propose the MobileMamba framework, which balances efficiency and performance.
MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods.
arXiv Detail & Related papers (2024-11-24T18:01:05Z) - Mamba-CL: Optimizing Selective State Space Model in Null Space for Continual Learning [54.19222454702032]
Continual Learning aims to equip AI models with the ability to learn a sequence of tasks over time, without forgetting previously learned knowledge.<n>State Space Models (SSMs) have achieved notable success in computer vision.<n>We introduce Mamba-CL, a framework that continuously fine-tunes the core SSMs of the large-scale Mamba foundation model.
arXiv Detail & Related papers (2024-11-23T06:36:16Z) - MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking [51.28485682954006]
We propose a pure Mamba-based framework (MambaVT) to fully exploit intrinsic-temporal contextual modeling for robust visible-thermal tracking.
Specifically, we devise the long-range cross-frame integration component to globally adapt to target appearance variations.
Experiments show the significant potential of vision Mamba for RGB-T tracking, with MambaVT achieving state-of-the-art performance on four mainstream benchmarks.
arXiv Detail & Related papers (2024-08-15T02:29:00Z) - MambaUIE&SR: Unraveling the Ocean's Secrets with Only 2.8 GFLOPs [1.7648680700685022]
Underwater Image Enhancement (UIE) techniques aim to address the problem of underwater image degradation due to light absorption and scattering.
Recent years, both Convolution Neural Network (CNN)-based and Transformer-based methods have been widely explored.
MambaUIE is able to efficiently synthesize global and local information and maintains a very small number of parameters with high accuracy.
arXiv Detail & Related papers (2024-04-22T05:12:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.