HGMamba: Enhancing 3D Human Pose Estimation with a HyperGCN-Mamba Network
- URL: http://arxiv.org/abs/2504.06638v1
- Date: Wed, 09 Apr 2025 07:28:19 GMT
- Title: HGMamba: Enhancing 3D Human Pose Estimation with a HyperGCN-Mamba Network
- Authors: Hu Cui, Tessai Hayama,
- Abstract summary: 3D human pose is a promising research area that leverages estimated and ground-truth 2D human pose data for training.<n>Existing approaches aim to enhance the performance of estimated 2D poses, but struggle when applied to ground-truth 2D pose data.<n>We propose a novel Hyper-GCN and Shuffle Mamba block, which processes input data through two parallel streams.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D human pose lifting is a promising research area that leverages estimated and ground-truth 2D human pose data for training. While existing approaches primarily aim to enhance the performance of estimated 2D poses, they often struggle when applied to ground-truth 2D pose data. We observe that achieving accurate 3D pose reconstruction from ground-truth 2D poses requires precise modeling of local pose structures, alongside the ability to extract robust global spatio-temporal features. To address these challenges, we propose a novel Hyper-GCN and Shuffle Mamba (HGMamba) block, which processes input data through two parallel streams: Hyper-GCN and Shuffle-Mamba. The Hyper-GCN stream models the human body structure as hypergraphs with varying levels of granularity to effectively capture local joint dependencies. Meanwhile, the Shuffle Mamba stream leverages a state space model to perform spatio-temporal scanning across all joints, enabling the establishment of global dependencies. By adaptively fusing these two representations, HGMamba achieves strong global feature modeling while excelling at local structure modeling. We stack multiple HGMamba blocks to create three variants of our model, allowing users to select the most suitable configuration based on the desired speed-accuracy trade-off. Extensive evaluations on the Human3.6M and MPI-INF-3DHP benchmark datasets demonstrate the effectiveness of our approach. HGMamba-B achieves state-of-the-art results, with P1 errors of 38.65 mm and 14.33 mm on the respective datasets. Code and models are available: https://github.com/HuCui2022/HGMamba
Related papers
- Mocap-2-to-3: Lifting 2D Diffusion-Based Pretrained Models for 3D Motion Capture [31.82852393452607]
Mocap-2-to-3 is a novel framework that decomposes intricate 3D motions into 2D poses.<n>We leverage 2D data to enhance 3D motion reconstruction in diverse scenarios.<n>We evaluate our model's performance on real-world datasets.
arXiv Detail & Related papers (2025-03-05T06:32:49Z) - GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency [50.11520458252128]
Existing 3D affordance learning methods struggle with generalization and robustness due to limited annotated data.<n>We propose GEAL, a novel framework designed to enhance the generalization and robustness of 3D affordance learning by leveraging large-scale pre-trained 2D models.<n>GEAL consistently outperforms existing methods across seen and novel object categories, as well as corrupted data.
arXiv Detail & Related papers (2024-12-12T17:59:03Z) - CameraHMR: Aligning People with Perspective [54.05758012879385]
We address the challenge of accurate 3D human pose and shape estimation from monocular images.
Existing training datasets containing real images with pseudo ground truth (pGT) use SMPLify to fit SMPL to sparse 2D joint locations.
We make two contributions that improve pGT accuracy.
arXiv Detail & Related papers (2024-11-12T19:12:12Z) - PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model [7.286873011001679]
We propose a purely SSM-based approach with linear correlations for complexityD human pose estimation in monocular video video.<n>Specifically, we propose a bidirectional global temporal-local-temporal block that comprehensively models human joint relations within individual frames as well as across frames.<n>This strategy provides a more logical geometric ordering strategy, resulting in a combined-local spatial scan.
arXiv Detail & Related papers (2024-08-07T04:38:03Z) - Pose Magic: Efficient and Temporally Consistent Human Pose Estimation with a Hybrid Mamba-GCN Network [40.123744788977525]
We propose a new attention-free hybrid architecture named Hybrid Mamba-GCN (Pose Magic)
By adaptively fusing representations from Mamba and GCN, Pose Magic demonstrates superior capability in learning the underlying 3D structure.
Experiments show that Pose Magic achieves new SOTA results while saving $74.1%$ FLOPs.
arXiv Detail & Related papers (2024-08-06T03:15:18Z) - Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation [36.93661496405653]
We take a global approach to exploit Transformer-temporal information with a concise Graph and Skipped Transformer architecture.
Specifically, in 3D pose stage, coarse-grained body parts are deployed to construct a fully data-driven adaptive model.
Experiments are conducted on Human3.6M, MPI-INF-3DHP and Human-Eva benchmarks.
arXiv Detail & Related papers (2024-07-03T10:42:09Z) - ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation [71.2556016049579]
ManiPose is a manifold-constrained multi-hypothesis model for human-pose 2D-to-3D lifting.<n>By constraining the outputs to lie on the human pose manifold, ManiPose guarantees the consistency of all hypothetical poses.<n>We showcase the performance of ManiPose on real-world datasets, where it outperforms state-of-the-art models in pose consistency.
arXiv Detail & Related papers (2023-12-11T13:50:10Z) - Double-chain Constraints for 3D Human Pose Estimation in Images and
Videos [21.42410292863492]
Reconstructing 3D poses from 2D poses lacking depth information is challenging due to the complexity and diversity of human motion.
We propose a novel model, called Double-chain Graph Convolutional Transformer (DC-GCT), to constrain the pose.
We show that DC-GCT achieves state-of-the-art performance on two challenging datasets.
arXiv Detail & Related papers (2023-08-10T02:41:18Z) - Deep Generative Models on 3D Representations: A Survey [81.73385191402419]
Generative models aim to learn the distribution of observed data by generating new instances.
Recently, researchers started to shift focus from 2D to 3D space.
representing 3D data poses significantly greater challenges.
arXiv Detail & Related papers (2022-10-27T17:59:50Z) - P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose
Estimation [78.83305967085413]
This paper introduces a novel Pre-trained Spatial Temporal Many-to-One (P-STMO) model for 2D-to-3D human pose estimation task.
Our method outperforms state-of-the-art methods with fewer parameters and less computational overhead.
arXiv Detail & Related papers (2022-03-15T04:00:59Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Learning 3D Human Shape and Pose from Dense Body Parts [117.46290013548533]
We propose a Decompose-and-aggregate Network (DaNet) to learn 3D human shape and pose from dense correspondences of body parts.
Messages from local streams are aggregated to enhance the robust prediction of the rotation-based poses.
Our method is validated on both indoor and real-world datasets including Human3.6M, UP3D, COCO, and 3DPW.
arXiv Detail & Related papers (2019-12-31T15:09:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.