3D Human Pose Estimation via Spatial Graph Order Attention and Temporal Body Aware Transformer
- URL: http://arxiv.org/abs/2505.01003v1
- Date: Fri, 02 May 2025 04:58:04 GMT
- Title: 3D Human Pose Estimation via Spatial Graph Order Attention and Temporal Body Aware Transformer
- Authors: Kamel Aouaidjia, Aofan Li, Wenhao Zhang, Chongsheng Zhang,
- Abstract summary: We propose a new method that exploits the graph modeling capability of GCN to represent each skeleton with multiple graphs of different orders.<n>The resulting spatial features of the sequence are processed using a proposed temporal Body Aware Transformer.<n>Experiments on Human3.6m, MPIINF-3DHP, and HumanEva-I datasets demonstrate the effectiveness of the proposed method.
- Score: 5.303583360581161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, Transformers and Graph Convolutional Networks (GCNs) are the prevailing techniques for 3D human pose estimation. However, Transformer-based methods either ignore the spatial neighborhood relationships between the joints when used for skeleton representations or disregard the local temporal patterns of the local joint movements in skeleton sequence modeling, while GCN-based methods often neglect the need for pose-specific representations. To address these problems, we propose a new method that exploits the graph modeling capability of GCN to represent each skeleton with multiple graphs of different orders, incorporated with a newly introduced Graph Order Attention module that dynamically emphasizes the most representative orders for each joint. The resulting spatial features of the sequence are further processed using a proposed temporal Body Aware Transformer that models the global body feature dependencies in the sequence with awareness of the local inter-skeleton feature dependencies of joints. Given that our 3D pose output aligns with the central 2D pose in the sequence, we improve the self-attention mechanism to be aware of the central pose while diminishing its focus gradually towards the first and the last poses. Extensive experiments on Human3.6m, MPIINF-3DHP, and HumanEva-I datasets demonstrate the effectiveness of the proposed method. Code and models are made available on Github.
Related papers
- PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation [5.223657684081615]
Existing monocular 3D pose estimation methods rely on joint positional features, while overlooking intrinsic directional and angular correlations within the skeleton.<n>We propose the PoseGRAF framework to address these challenges.<n> Experimental results on the Human3.6M and MPI-INF-3DHP datasets show that our method exceeds state-of-the-art approaches.
arXiv Detail & Related papers (2025-06-17T14:59:56Z) - STGFormer: Spatio-Temporal GraphFormer for 3D Human Pose Estimation in Video [7.345621536750547]
This paper presents the S-Temporal GraphFormer framework (STGFormer) for 3D human pose estimation in videos.<n>First, we introduce a STG attention mechanism, designed to more effectively leverage the inherent graph distributions of human body.<n>Next, we present a Modulated Hop-wise Regular GCN to independently process temporal and spatial dimensions in parallel.<n>Finally, we demonstrate our method state-of-the-art performance on the Human3.6M and MPIINF-3DHP datasets.
arXiv Detail & Related papers (2024-07-14T06:45:27Z) - Spatio-temporal MLP-graph network for 3D human pose estimation [8.267311047244881]
Graph convolutional networks and their variants have shown significant promise in 3D human pose estimation.
We introduce a new weighted Jacobi feature rule obtained through graph filtering with implicit propagation fairing.
We also employ adjacency modulation with the aim of learning meaningful correlations beyond defined between body joints.
arXiv Detail & Related papers (2023-08-29T14:00:55Z) - Unsupervised 3D Pose Estimation with Non-Rigid Structure-from-Motion
Modeling [83.76377808476039]
We propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior.
Inspired by the field of non-rigid structure-from-motion, we divide the task of reconstructing 3D human skeletons in motion into the estimation of a 3D reference skeleton.
A mixed spatial-temporal NRSfMformer is used to simultaneously estimate the 3D reference skeleton and the skeleton deformation of each frame from 2D observations sequence.
arXiv Detail & Related papers (2023-08-18T16:41:57Z) - Iterative Graph Filtering Network for 3D Human Pose Estimation [5.177947445379688]
Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation.
In this paper, we introduce an iterative graph filtering framework for 3D human pose estimation.
Our approach builds upon the idea of iteratively solving graph filtering with Laplacian regularization.
arXiv Detail & Related papers (2023-07-29T20:46:44Z) - Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action
Recognition through Redefined Skeletal Topology Awareness [24.83836008577395]
Graph Convolutional Networks (GCNs) have long defined the state-of-the-art in skeleton-based action recognition.
They tend to optimize the adjacency matrix jointly with the model weights.
This process causes a gradual decay of bone connectivity data, culminating in a model indifferent to the very topology it sought to map.
We propose an innovative pathway that encodes bone connectivity by harnessing the power of graph distances.
arXiv Detail & Related papers (2023-05-19T06:40:12Z) - Pose-Oriented Transformer with Uncertainty-Guided Refinement for
2D-to-3D Human Pose Estimation [51.00725889172323]
We propose a Pose-Oriented Transformer (POT) with uncertainty guided refinement for 3D human pose estimation.
We first develop novel pose-oriented self-attention mechanism and distance-related position embedding for POT to explicitly exploit the human skeleton topology.
We present an Uncertainty-Guided Refinement Network (UGRN) to refine pose predictions from POT, especially for the difficult joints.
arXiv Detail & Related papers (2023-02-15T00:22:02Z) - Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images [79.70127290464514]
We decompose the task into two stages, i.e. person localization and pose estimation.
And we propose three task-specific graph neural networks for effective message passing.
Our approach achieves state-of-the-art performance on CMU Panoptic and Shelf datasets.
arXiv Detail & Related papers (2021-09-13T11:44:07Z) - Revisiting Skeleton-based Action Recognition [107.08112310075114]
PoseC3D is a new approach to skeleton-based action recognition, which relies on a 3D heatmap instead stack a graph sequence as the base representation of human skeletons.
On four challenging datasets, PoseC3D consistently obtains superior performance, when used alone on skeletons and in combination with the RGB modality.
arXiv Detail & Related papers (2021-04-28T06:32:17Z) - On the spatial attention in Spatio-Temporal Graph Convolutional Networks
for skeleton-based human action recognition [97.14064057840089]
Graphal networks (GCNs) promising performance in skeleton-based human action recognition by modeling a sequence of skeletons as a graph.
Most of the recently proposed G-temporal-based methods improve the performance by learning the graph structure at each layer of the network.
arXiv Detail & Related papers (2020-11-07T19:03:04Z) - Disentangling and Unifying Graph Convolutions for Skeleton-Based Action
Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D.
By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.