Lifting by Image -- Leveraging Image Cues for Accurate 3D Human Pose
Estimation
- URL: http://arxiv.org/abs/2312.15636v1
- Date: Mon, 25 Dec 2023 07:50:58 GMT
- Title: Lifting by Image -- Leveraging Image Cues for Accurate 3D Human Pose
Estimation
- Authors: Feng Zhou, Jianqin Yin, Peiyang Li
- Abstract summary: "lifting from 2D pose" method has been the dominant approach to 3D Human Pose Estimation (3DHPE)
Rich semantic and texture information in images can contribute to a more accurate "lifting" procedure.
In this paper, we give new insight into the cause of poor generalization problems and the effectiveness of image features.
- Score: 10.374944534302234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The "lifting from 2D pose" method has been the dominant approach to 3D Human
Pose Estimation (3DHPE) due to the powerful visual analysis ability of 2D pose
estimators. Widely known, there exists a depth ambiguity problem when
estimating solely from 2D pose, where one 2D pose can be mapped to multiple 3D
poses. Intuitively, the rich semantic and texture information in images can
contribute to a more accurate "lifting" procedure. Yet, existing research
encounters two primary challenges. Firstly, the distribution of image data in
3D motion capture datasets is too narrow because of the laboratorial
environment, which leads to poor generalization ability of methods trained with
image information. Secondly, effective strategies for leveraging image
information are lacking. In this paper, we give new insight into the cause of
poor generalization problems and the effectiveness of image features. Based on
that, we propose an advanced framework. Specifically, the framework consists of
two stages. First, we enable the keypoints to query and select the beneficial
features from all image patches. To reduce the keypoints attention to
inconsequential background features, we design a novel Pose-guided Transformer
Layer, which adaptively limits the updates to unimportant image patches. Then,
through a designed Adaptive Feature Selection Module, we prune less significant
image patches from the feature map. In the second stage, we allow the keypoints
to further emphasize the retained critical image features. This progressive
learning approach prevents further training on insignificant image features.
Experimental results show that our model achieves state-of-the-art performance
on both the Human3.6M dataset and the MPI-INF-3DHP dataset.
Related papers
- UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - Weakly-supervised Pre-training for 3D Human Pose Estimation via
Perspective Knowledge [36.65402869749077]
We propose a novel method to extract weak 3D information directly from 2D images without 3D pose supervision.
We propose a weakly-supervised pre-training (WSP) strategy to distinguish the depth relationship between two points in an image.
WSP achieves state-of-the-art results on two widely-used benchmarks.
arXiv Detail & Related papers (2022-11-22T03:35:15Z) - PONet: Robust 3D Human Pose Estimation via Learning Orientations Only [116.1502793612437]
We propose a novel Pose Orientation Net (PONet) that is able to robustly estimate 3D pose by learning orientations only.
PONet estimates the 3D orientation of these limbs by taking advantage of the local image evidence to recover the 3D pose.
We evaluate our method on multiple datasets, including Human3.6M, MPII, MPI-INF-3DHP, and 3DPW.
arXiv Detail & Related papers (2021-12-21T12:48:48Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - View-Invariant, Occlusion-Robust Probabilistic Embedding for Human Pose [36.384824115033304]
We propose an approach to learning a compact view-invariant embedding space from 2D body joint keypoints, without explicitly predicting 3D poses.
Experimental results show that our embedding model achieves higher accuracy when retrieving similar poses across different camera views.
arXiv Detail & Related papers (2020-10-23T17:58:35Z) - Unsupervised 3D Human Pose Representation with Viewpoint and Pose
Disentanglement [63.853412753242615]
Learning a good 3D human pose representation is important for human pose related tasks.
We propose a novel Siamese denoising autoencoder to learn a 3D pose representation.
Our approach achieves state-of-the-art performance on two inherently different tasks.
arXiv Detail & Related papers (2020-07-14T14:25:22Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the
Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data.
We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.