SasMamba: A Lightweight Structure-Aware Stride State Space Model for 3D Human Pose Estimation
- URL: http://arxiv.org/abs/2511.08872v1
- Date: Thu, 13 Nov 2025 01:13:22 GMT
- Title: SasMamba: A Lightweight Structure-Aware Stride State Space Model for 3D Human Pose Estimation
- Authors: Hu Cui, Wenqiang Hua, Renjing Huang, Shurui Jia, Tessai Hayama,
- Abstract summary: We propose a structure-awaretemporal convolution to dynamically capture essential local interactions between joints.<n>We then apply a stride-based scan strategy to construct multi-scale global structural representations.<n>Our model SasMamba achieves competitive 3D pose estimation performance with significantly fewer parameters compared to existing hybrid models.
- Score: 0.8427427828815586
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the Mamba architecture based on State Space Models (SSMs) has gained attention in 3D human pose estimation due to its linear complexity and strong global modeling capability. However, existing SSM-based methods typically apply manually designed scan operations to flatten detected 2D pose sequences into purely temporal sequences, either locally or globally. This approach disrupts the inherent spatial structure of human poses and entangles spatial and temporal features, making it difficult to capture complex pose dependencies. To address these limitations, we propose the Skeleton Structure-Aware Stride SSM (SAS-SSM), which first employs a structure-aware spatiotemporal convolution to dynamically capture essential local interactions between joints, and then applies a stride-based scan strategy to construct multi-scale global structural representations. This enables flexible modeling of both local and global pose information while maintaining linear computational complexity. Built upon SAS-SSM, our model SasMamba achieves competitive 3D pose estimation performance with significantly fewer parameters compared to existing hybrid models. The source code is available at https://hucui2022.github.io/sasmamba_proj/.
Related papers
- SAM 3D Body: Robust Full-Body Human Mesh Recovery [65.0108906331903]
We introduce SAM 3D Body (3DB), a promptable model for single-image full-body 3D human mesh recovery (HMR)<n>3DB estimates the human pose of the body, feet, and hands.<n>It is the first model to use a new parametric mesh representation, Momentum Human Rig (MHR), which decouples skeletal structure and surface shape.
arXiv Detail & Related papers (2026-02-17T20:26:37Z) - Towards Geometry-Aware and Motion-Guided Video Human Mesh Recovery [60.51998732898099]
We introduce HMRMamba, a new paradigm for 3D Human Mesh Recovery.<n>It pioneers the use of Structured State Space Models for their efficiency and long-range modeling prowess.<n>Our framework is distinguished by two core contributions. First, the Geometry-Aware Lifting Module, featuring a novel dual-scan Mamba architecture.
arXiv Detail & Related papers (2026-01-29T08:05:02Z) - PRGCN: A Graph Memory Network for Cross-Sequence Pattern Reuse in 3D Human Pose Estimation [18.771349697842947]
This work introduces the Pattern Reuse Graph Conal Network (PRGCN), a novel framework that formalizes pose estimation as a problem of pattern retrieval and adaptation.<n>At its core, PRGCN features a graph memory bank that learns and stores a compact set of pose prototypes, encoded as relational graphs, which are dynamically retrieved via an attention mechanism to provide structured priors.<n>Our work posits that PRGCN establishes a new state-of-the-art, achieving an MPJPE of 37.1mm and 13.4mm, respectively, while exhibiting enhanced cross-domain generalization capability.
arXiv Detail & Related papers (2025-10-22T11:12:07Z) - Mamba-Driven Topology Fusion for Monocular 3D Human Pose Estimation [41.14182025718559]
We propose the Mamba-Driven Topology Fusion framework for 3D human pose estimation.<n>Specifically, the proposed Bone Aware Module infers the direction and length of bone vectors in the spherical coordinate system.<n>We also design a Spatiotemporal Refinement Module to model both temporal and spatial relationships within the sequence.
arXiv Detail & Related papers (2025-05-27T01:21:57Z) - HGMamba: Enhancing 3D Human Pose Estimation with a HyperGCN-Mamba Network [0.0]
3D human pose is a promising research area that leverages estimated and ground-truth 2D human pose data for training.<n>Existing approaches aim to enhance the performance of estimated 2D poses, but struggle when applied to ground-truth 2D pose data.<n>We propose a novel Hyper-GCN and Shuffle Mamba block, which processes input data through two parallel streams.
arXiv Detail & Related papers (2025-04-09T07:28:19Z) - UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection [53.785766442201094]
Recent advances in LiDAR 3D detection have demonstrated the effectiveness of Transformer-based frameworks in capturing the global dependencies from point cloud spaces.<n>Due to the considerable number of 3D voxels and quadratic complexity of Transformers, multiple sequences are grouped before feeding to Transformers, leading to a limited receptive field.<n>Inspired by the impressive performance of State Space Models (SSM) achieved in the field of 2D vision tasks, we propose a novel Unified Mamba (UniMamba)<n>Specifically, a UniMamba block is designed which mainly consists of locality modeling, Z-order serialization and local-global sequential aggregator.
arXiv Detail & Related papers (2025-03-15T06:22:31Z) - PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model [7.286873011001679]
We propose a purely SSM-based approach with linear correlations for complexityD human pose estimation in monocular video video.<n>Specifically, we propose a bidirectional global temporal-local-temporal block that comprehensively models human joint relations within individual frames as well as across frames.<n>This strategy provides a more logical geometric ordering strategy, resulting in a combined-local spatial scan.
arXiv Detail & Related papers (2024-08-07T04:38:03Z) - Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation [36.93661496405653]
We take a global approach to exploit Transformer-temporal information with a concise Graph and Skipped Transformer architecture.
Specifically, in 3D pose stage, coarse-grained body parts are deployed to construct a fully data-driven adaptive model.
Experiments are conducted on Human3.6M, MPI-INF-3DHP and Human-Eva benchmarks.
arXiv Detail & Related papers (2024-07-03T10:42:09Z) - Global-to-Local Modeling for Video-based 3D Human Pose and Shape
Estimation [53.04781510348416]
Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness.
We propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT)
Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M.
arXiv Detail & Related papers (2023-03-26T14:57:49Z) - S2RMs: Spatially Structured Recurrent Modules [105.0377129434636]
We take a step towards exploiting dynamic structure that are capable of simultaneously exploiting both modular andtemporal structures.
We find our models to be robust to the number of available views and better capable of generalization to novel tasks without additional training.
arXiv Detail & Related papers (2020-07-13T17:44:30Z) - Disentangling and Unifying Graph Convolutions for Skeleton-Based Action
Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D.
By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.