(Fusionformer):Exploiting the Joint Motion Synergy with Fusion Network
Based On Transformer for 3D Human Pose Estimation
- URL: http://arxiv.org/abs/2210.04006v1
- Date: Sat, 8 Oct 2022 12:22:10 GMT
- Title: (Fusionformer):Exploiting the Joint Motion Synergy with Fusion Network
Based On Transformer for 3D Human Pose Estimation
- Authors: Xinwei Yu
- Abstract summary: Many previous methods lack the understanding of local joint information.cite8888987considers the temporal relationship of a single joint in this work.
Our proposed textbfFusionformer method introduces a global-temporal self-trajectory module and a cross-temporal self-trajectory module.
The results show an improvement of 2.4% MPJPE and 4.3% P-MPJPE on the Human3.6M dataset.
- Score: 1.52292571922932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For the current 3D human pose estimation task, in order to improve the
efficiency of pose sequence output, we try to further improve the prediction
stability in low input video frame scenarios.Many previous methods lack the
understanding of local joint information.\cite{9878888}considers the temporal
relationship of a single joint in this work.However, we found that there is a
certain predictive correlation between the trajectories of different joints in
time.Therefore, our proposed \textbf{Fusionformer} method introduces a
self-trajectory module and a cross-trajectory module based on the
spatio-temporal module.After that, the global spatio-temporal features and
local joint trajectory features are fused through a linear network in a
parallel manner.To eliminate the influence of bad 2D poses on 3D projections,
finally we also introduce a pose refinement network to balance the consistency
of 3D projections.In addition, we evaluate the proposed method on two benchmark
datasets (Human3.6M, MPI-INF-3DHP). Comparing our method with the baseline
method poseformer, the results show an improvement of 2.4\% MPJPE and 4.3\%
P-MPJPE on the Human3.6M dataset, respectively.
Related papers
- Enhancing 3D Human Pose Estimation Amidst Severe Occlusion with Dual Transformer Fusion [13.938406073551844]
This paper introduces a Dual Transformer Fusion (DTF) algorithm, a novel approach to obtain a holistic 3D pose estimation.
To enable precise 3D Human Pose Estimation, our approach leverages the innovative DTF architecture, which first generates a pair of intermediate views.
Our approach outperforms existing state-of-the-art methods on both datasets, yielding substantial improvements.
arXiv Detail & Related papers (2024-10-06T18:15:27Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose
Reconstruction in a Diffusion Framework [6.669850111205944]
Monocular 3D human pose estimation poses significant challenges due to inherent depth ambiguities that arise during the reprojection process from 2D to 3D.
Recent advancements in diffusion models have shown promise in incorporating structural priors to address reprojection ambiguities.
We propose a novel cross-channel embedding framework that aims to fully explore the correlation between joint-level features of 3D coordinates and their 2D projections.
arXiv Detail & Related papers (2024-01-18T09:53:03Z) - Spatio-temporal MLP-graph network for 3D human pose estimation [8.267311047244881]
Graph convolutional networks and their variants have shown significant promise in 3D human pose estimation.
We introduce a new weighted Jacobi feature rule obtained through graph filtering with implicit propagation fairing.
We also employ adjacency modulation with the aim of learning meaningful correlations beyond defined between body joints.
arXiv Detail & Related papers (2023-08-29T14:00:55Z) - Unsupervised 3D Pose Estimation with Non-Rigid Structure-from-Motion
Modeling [83.76377808476039]
We propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior.
Inspired by the field of non-rigid structure-from-motion, we divide the task of reconstructing 3D human skeletons in motion into the estimation of a 3D reference skeleton.
A mixed spatial-temporal NRSfMformer is used to simultaneously estimate the 3D reference skeleton and the skeleton deformation of each frame from 2D observations sequence.
arXiv Detail & Related papers (2023-08-18T16:41:57Z) - Pose-Oriented Transformer with Uncertainty-Guided Refinement for
2D-to-3D Human Pose Estimation [51.00725889172323]
We propose a Pose-Oriented Transformer (POT) with uncertainty guided refinement for 3D human pose estimation.
We first develop novel pose-oriented self-attention mechanism and distance-related position embedding for POT to explicitly exploit the human skeleton topology.
We present an Uncertainty-Guided Refinement Network (UGRN) to refine pose predictions from POT, especially for the difficult joints.
arXiv Detail & Related papers (2023-02-15T00:22:02Z) - Learnable human mesh triangulation for 3D human pose and shape
estimation [6.699132260402631]
The accuracy of joint rotation and shape estimation has received relatively little attention in the skinned multi-person linear model (SMPL)-based human mesh reconstruction from multi-view images.
We propose a two-stage method to resolve the ambiguity of joint rotation and shape reconstruction and the difficulty of network learning.
The proposed method significantly outperforms the previous works in terms of joint rotation and shape estimation, and achieves competitive performance in terms of joint location estimation.
arXiv Detail & Related papers (2022-08-24T01:11:57Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose
Estimation in Video [75.23812405203778]
Recent solutions have been introduced to estimate 3D human pose from 2D keypoint sequence by considering body joints among all frames globally to learn-temporal correlation.
We propose Mix Mix, which has temporal transformer block to separately model the temporal motion of each joint and a transformer block inter-joint spatial correlation.
In addition, the network output is extended from the central frame to entire frames of input video, improving the coherence between the input and output benchmarks.
arXiv Detail & Related papers (2022-03-02T04:20:59Z) - Learning 3D Human Shape and Pose from Dense Body Parts [117.46290013548533]
We propose a Decompose-and-aggregate Network (DaNet) to learn 3D human shape and pose from dense correspondences of body parts.
Messages from local streams are aggregated to enhance the robust prediction of the rotation-based poses.
Our method is validated on both indoor and real-world datasets including Human3.6M, UP3D, COCO, and 3DPW.
arXiv Detail & Related papers (2019-12-31T15:09:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.