A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose
Estimation
- URL: http://arxiv.org/abs/2311.03312v2
- Date: Thu, 9 Nov 2023 04:51:34 GMT
- Title: A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose
Estimation
- Authors: Qitao Zhao, Ce Zheng, Mengyuan Liu, Chen Chen
- Abstract summary: The dominant paradigm in 3D human pose estimation that lifts a 2D pose sequence to 3D heavily relies on long-term temporal clues.
This can be attributed to their inherent inability to perceive spatial context as plain 2D joint coordinates carry no visual cues.
We propose a straightforward yet powerful solution: leveraging the readily available intermediate visual representations produced by off-the-shelf (pre-trained) 2D pose detectors.
- Score: 18.72362803593654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The dominant paradigm in 3D human pose estimation that lifts a 2D pose
sequence to 3D heavily relies on long-term temporal clues (i.e., using a
daunting number of video frames) for improved accuracy, which incurs
performance saturation, intractable computation and the non-causal problem.
This can be attributed to their inherent inability to perceive spatial context
as plain 2D joint coordinates carry no visual cues. To address this issue, we
propose a straightforward yet powerful solution: leveraging the readily
available intermediate visual representations produced by off-the-shelf
(pre-trained) 2D pose detectors -- no finetuning on the 3D task is even needed.
The key observation is that, while the pose detector learns to localize 2D
joints, such representations (e.g., feature maps) implicitly encode the
joint-centric spatial context thanks to the regional operations in backbone
networks. We design a simple baseline named Context-Aware PoseFormer to
showcase its effectiveness. Without access to any temporal information, the
proposed method significantly outperforms its context-agnostic counterpart,
PoseFormer, and other state-of-the-art methods using up to hundreds of video
frames regarding both speed and precision. Project page:
https://qitaozhao.github.io/ContextAware-PoseFormer
Related papers
- 2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation? [5.408549711581793]
We study the effect of using either 2D or 3D joint coordinates as training data on the performance of speech-to-gesture deep generative models.
We employ a lifting model for converting generated 2D pose sequences into 3D and assess how gestures created directly in 3D stack up against those initially generated in 2D and then converted to 3D.
arXiv Detail & Related papers (2024-09-16T15:06:12Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale
Visual Localization [44.05930316729542]
We propose EP2P-Loc, a novel large-scale visual localization method for 3D point clouds.
To increase the number of inliers, we propose a simple algorithm to remove invisible 3D points in the image.
For the first time in this task, we employ a differentiable for end-to-end training.
arXiv Detail & Related papers (2023-09-14T07:06:36Z) - Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video [23.93644678238666]
We propose a Pose and Mesh Co-Evolution network (PMCE) to recover 3D human motion from a video.
The proposed PMCE outperforms previous state-of-the-art methods in terms of both per-frame accuracy and temporal consistency.
arXiv Detail & Related papers (2023-08-20T16:03:21Z) - CheckerPose: Progressive Dense Keypoint Localization for Object Pose
Estimation with Graph Neural Network [66.24726878647543]
Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task.
Recent studies have shown the great potential of dense correspondence-based solutions.
We propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects.
arXiv Detail & Related papers (2023-03-29T17:30:53Z) - Improving Robustness and Accuracy via Relative Information Encoding in
3D Human Pose Estimation [59.94032196768748]
We propose a relative information encoding method that yields positional and temporal enhanced representations.
Our method outperforms state-of-the-art methods on two public datasets.
arXiv Detail & Related papers (2021-07-29T14:12:19Z) - Weakly-supervised Cross-view 3D Human Pose Estimation [16.045255544594625]
We propose a simple yet effective pipeline for weakly-supervised cross-view 3D human pose estimation.
Our method can achieve state-of-the-art performance in a weakly-supervised manner.
We evaluate our method on the standard benchmark dataset, Human3.6M.
arXiv Detail & Related papers (2021-05-23T08:16:25Z) - Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh
Recovery from a 2D Human Pose [70.23652933572647]
We propose a novel graph convolutional neural network (GraphCNN)-based system that estimates the 3D coordinates of human mesh vertices directly from the 2D human pose.
We show that our Pose2Mesh outperforms the previous 3D human pose and mesh estimation methods on various benchmark datasets.
arXiv Detail & Related papers (2020-08-20T16:01:56Z) - VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild
Environment [80.77351380961264]
We present an approach to estimate 3D poses of multiple people from multiple camera views.
We present an end-to-end solution which operates in the $3$D space, therefore avoids making incorrect decisions in the 2D space.
We propose Pose Regression Network (PRN) to estimate a detailed 3D pose for each proposal.
arXiv Detail & Related papers (2020-04-13T23:50:01Z) - HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation [7.559220068352681]
We propose a lightweight model called HOPE-Net which jointly estimates hand and object pose in 2D and 3D in real-time.
Our network uses a cascade of two adaptive graph convolutional neural networks, one to estimate 2D coordinates of the hand joints and object corners, followed by another to convert 2D coordinates to 3D.
arXiv Detail & Related papers (2020-03-31T19:01:42Z) - Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A
Geometric Approach [76.10879433430466]
We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person's limbs.
It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space.
The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset.
arXiv Detail & Related papers (2020-03-25T00:26:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.