LiftFormer: 3D Human Pose Estimation using attention models
- URL: http://arxiv.org/abs/2009.00348v1
- Date: Tue, 1 Sep 2020 11:05:45 GMT
- Title: LiftFormer: 3D Human Pose Estimation using attention models
- Authors: Adrian Llopart
- Abstract summary: We propose the usage of models to obtain more accurate 3D predictions by leveraging attention mechanisms on ordered sequences human poses in videos.
Our method consistently outperforms the previous best results from the literature when using both 2D keypoint predictors by 0.3 mm (44.8 MPJPE, 0.7% improvement) and ground truth inputs by 2mm (MPJPE: 31.9, 8.4% improvement) on Human3.6M.
Our 3D lifting model's accuracy exceeds that of other end-to-end or SMPL approaches and is comparable to many multi-view methods.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating the 3D position of human joints has become a widely researched
topic in the last years. Special emphasis has gone into defining novel methods
that extrapolate 2-dimensional data (keypoints) into 3D, namely predicting the
root-relative coordinates of joints associated to human skeletons. The latest
research trends have proven that the Transformer Encoder blocks aggregate
temporal information significantly better than previous approaches. Thus, we
propose the usage of these models to obtain more accurate 3D predictions by
leveraging temporal information using attention mechanisms on ordered sequences
human poses in videos.
Our method consistently outperforms the previous best results from the
literature when using both 2D keypoint predictors by 0.3 mm (44.8 MPJPE, 0.7%
improvement) and ground truth inputs by 2mm (MPJPE: 31.9, 8.4% improvement) on
Human3.6M. It also achieves state-of-the-art performance on the HumanEva-I
dataset with 10.5 P-MPJPE (22.2% reduction). The number of parameters in our
model is easily tunable and is smaller (9.5M) than current methodologies
(16.95M and 11.25M) whilst still having better performance. Thus, our 3D
lifting model's accuracy exceeds that of other end-to-end or SMPL approaches
and is comparable to many multi-view methods.
Related papers
- Enhancing 3D Human Pose Estimation Amidst Severe Occlusion with Dual Transformer Fusion [13.938406073551844]
This paper introduces a Dual Transformer Fusion (DTF) algorithm, a novel approach to obtain a holistic 3D pose estimation.
To enable precise 3D Human Pose Estimation, our approach leverages the innovative DTF architecture, which first generates a pair of intermediate views.
Our approach outperforms existing state-of-the-art methods on both datasets, yielding substantial improvements.
arXiv Detail & Related papers (2024-10-06T18:15:27Z) - HOIMotion: Forecasting Human Motion During Human-Object Interactions Using Egocentric 3D Object Bounding Boxes [10.237077867790612]
We present HOIMotion, a novel approach for human motion forecasting during human-object interactions.
Our method integrates information about past body poses and egocentric 3D object bounding boxes.
We show that HOIMotion consistently outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2024-07-02T19:58:35Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation [68.75387874066647]
We propose an Uncertainty-Aware testing-time optimization framework for 3D human pose estimation.
Our approach outperforms the previous best result by a large margin of 4.5% on Human3.6M.
arXiv Detail & Related papers (2024-02-04T04:28:02Z) - Efficient Domain Adaptation via Generative Prior for 3D Infant Pose
Estimation [29.037799937729687]
3D human pose estimation has gained impressive development in recent years, but only a few works focus on infants, that have different bone lengths and also have limited data.
Here, we show that our model attains state-of-the-art MPJPE performance of 43.6 mm on the SyRIP dataset and 21.2 mm on the MINI-RGBD dataset.
We also prove that our method, ZeDO-i, could attain efficient domain adaptation, even if only a small number of data is given.
arXiv Detail & Related papers (2023-11-17T20:49:37Z) - TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting [27.3359362364858]
We present an efficient multi-view pose estimation model that learns a robust temporal representation.
Our model is able to generalize across datasets without fine-tuning.
arXiv Detail & Related papers (2023-09-14T17:56:30Z) - DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion [54.0238087499699]
We show that diffusion models enhance the accuracy, robustness, and coherence of human pose estimations.
We introduce DiffHPE, a novel strategy for harnessing diffusion models in 3D-HPE.
Our findings indicate that while standalone diffusion models provide commendable performance, their accuracy is even better in combination with supervised models.
arXiv Detail & Related papers (2023-09-04T12:54:10Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Cascaded deep monocular 3D human pose estimation with evolutionary
training data [76.3478675752847]
Deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation.
This paper proposes a novel data augmentation method that is scalable for massive amount of training data.
Our method synthesizes unseen 3D human skeletons based on a hierarchical human representation and synthesizings inspired by prior knowledge.
arXiv Detail & Related papers (2020-06-14T03:09:52Z) - Anatomy-aware 3D Human Pose Estimation with Bone-based Pose
Decomposition [92.99291528676021]
Instead of directly regressing the 3D joint locations, we decompose the task into bone direction prediction and bone length prediction.
Our motivation is the fact that the bone lengths of a human skeleton remain consistent across time.
Our full model outperforms the previous best results on Human3.6M and MPI-INF-3DHP datasets.
arXiv Detail & Related papers (2020-02-24T15:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.