Improving Robustness and Accuracy via Relative Information Encoding in
3D Human Pose Estimation
- URL: http://arxiv.org/abs/2107.13994v1
- Date: Thu, 29 Jul 2021 14:12:19 GMT
- Title: Improving Robustness and Accuracy via Relative Information Encoding in
3D Human Pose Estimation
- Authors: Wenkang Shan, Haopeng Lu, Shanshe Wang, Xinfeng Zhang, Wen Gao
- Abstract summary: We propose a relative information encoding method that yields positional and temporal enhanced representations.
Our method outperforms state-of-the-art methods on two public datasets.
- Score: 59.94032196768748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most of the existing 3D human pose estimation approaches mainly focus on
predicting 3D positional relationships between the root joint and other human
joints (local motion) instead of the overall trajectory of the human body
(global motion). Despite the great progress achieved by these approaches, they
are not robust to global motion, and lack the ability to accurately predict
local motion with a small movement range. To alleviate these two problems, we
propose a relative information encoding method that yields positional and
temporal enhanced representations. Firstly, we encode positional information by
utilizing relative coordinates of 2D poses to enhance the consistency between
the input and output distribution. The same posture with different absolute 2D
positions can be mapped to a common representation. It is beneficial to resist
the interference of global motion on the prediction results. Second, we encode
temporal information by establishing the connection between the current pose
and other poses of the same person within a period of time. More attention will
be paid to the movement changes before and after the current pose, resulting in
better prediction performance on local motion with a small movement range. The
ablation studies validate the effectiveness of the proposed relative
information encoding method. Besides, we introduce a multi-stage optimization
method to the whole framework to further exploit the positional and temporal
enhanced representations. Our method outperforms state-of-the-art methods on
two public datasets. Code is available at
https://github.com/paTRICK-swk/Pose3D-RIE.
Related papers
- UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose
Estimation [18.72362803593654]
The dominant paradigm in 3D human pose estimation that lifts a 2D pose sequence to 3D heavily relies on long-term temporal clues.
This can be attributed to their inherent inability to perceive spatial context as plain 2D joint coordinates carry no visual cues.
We propose a straightforward yet powerful solution: leveraging the readily available intermediate visual representations produced by off-the-shelf (pre-trained) 2D pose detectors.
arXiv Detail & Related papers (2023-11-06T18:04:13Z) - Best Practices for 2-Body Pose Forecasting [58.661899246497896]
We review the progress in human pose forecasting and provide an in-depth assessment of the single-person practices that perform best.
Other single-person practices do not transfer to 2-body, so the proposed best ones do not include hierarchical body modeling or attention-based interaction encoding.
Our proposed 2-body pose forecasting best practices yield a performance improvement of 21.9% over the state-of-the-art on the most recent ExPI dataset.
arXiv Detail & Related papers (2023-04-12T10:46:23Z) - Mutual Information-Based Temporal Difference Learning for Human Pose
Estimation in Video [16.32910684198013]
We present a novel multi-frame human pose estimation framework, which employs temporal differences across frames to model dynamic contexts.
To be specific, we design a multi-stage entangled learning sequences conditioned on multi-stage differences to derive informative motion representation sequences.
These place us to rank No.1 in the Crowd Pose Estimation in Complex Events Challenge on benchmark HiEve.
arXiv Detail & Related papers (2023-03-15T09:29:03Z) - Kinematic-aware Hierarchical Attention Network for Human Pose Estimation
in Videos [17.831839654593452]
Previous-based human pose estimation methods have shown promising results by leveraging features of consecutive frames.
Most approaches compromise accuracy to jitter and do not comprehend the temporal aspects of human motion.
We design an architecture that exploits kinematic keypoint features.
arXiv Detail & Related papers (2022-11-29T01:46:11Z) - LocATe: End-to-end Localization of Actions in 3D with Transformers [91.28982770522329]
LocATe is an end-to-end approach that jointly localizes and recognizes actions in a 3D sequence.
Unlike transformer-based object-detection and classification models which consider image or patch features as input, LocATe's transformer model is capable of capturing long-term correlations between actions in a sequence.
We introduce a new, challenging, and more realistic benchmark dataset, BABEL-TAL-20 (BT20), where the performance of state-of-the-art methods is significantly worse.
arXiv Detail & Related papers (2022-03-21T03:35:32Z) - Camera Motion Agnostic 3D Human Pose Estimation [8.090223360924004]
This paper presents a camera motion agnostic approach for predicting 3D human pose and mesh defined in the world coordinate system.
We propose a network based on bidirectional gated recurrent units (GRUs) that predicts the global motion sequence from the local pose sequence.
We use 3DPW and synthetic datasets, which are constructed in a moving-camera environment, for evaluation.
arXiv Detail & Related papers (2021-12-01T08:22:50Z) - Self-Attentive 3D Human Pose and Shape Estimation from Videos [82.63503361008607]
We present a video-based learning algorithm for 3D human pose and shape estimation.
We exploit temporal information in videos and propose a self-attention module.
We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets.
arXiv Detail & Related papers (2021-03-26T00:02:19Z) - A Graph Attention Spatio-temporal Convolutional Network for 3D Human
Pose Estimation in Video [7.647599484103065]
We improve the learning of constraints in human skeleton by modeling local global spatial information via attention mechanisms.
Our approach effectively mitigates depth ambiguity and self-occlusion, generalizes to half upper body estimation, and achieves competitive performance on 2D-to-3D video pose estimation.
arXiv Detail & Related papers (2020-03-11T14:54:40Z) - Learning 3D Human Shape and Pose from Dense Body Parts [117.46290013548533]
We propose a Decompose-and-aggregate Network (DaNet) to learn 3D human shape and pose from dense correspondences of body parts.
Messages from local streams are aggregated to enhance the robust prediction of the rotation-based poses.
Our method is validated on both indoor and real-world datasets including Human3.6M, UP3D, COCO, and 3DPW.
arXiv Detail & Related papers (2019-12-31T15:09:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.