Related papers: PandaPose: 3D Human Pose Lifting from a Single Image via Propagating 2D Pose Prior to 3D Anchor Space

PandaPose: 3D Human Pose Lifting from a Single Image via Propagating 2D Pose Prior to 3D Anchor Space

URL: http://arxiv.org/abs/2602.01095v1
Date: Sun, 01 Feb 2026 08:20:40 GMT
Title: PandaPose: 3D Human Pose Lifting from a Single Image via Propagating 2D Pose Prior to 3D Anchor Space
Authors: Jinghong Zheng, Changlong Jiang, Yang Xiao, Jiaqi Li, Haohong Kuang, Hang Xu, Ran Wang, Zhiguo Cao, Min Du, Joey Tianyi Zhou,
Abstract summary: PandaPose is a 3D human pose lifting approach via propagating 2D pose prior to 3D anchor space as the unified intermediate representation.<n>Our 3D anchor space comprises: (1) Joint-wise 3D anchors in the canonical coordinate system, providing accurate and robust priors to mitigate 2D pose estimation inaccuracies.
Score: 62.10630827126755
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D human pose lifting from a single RGB image is a challenging task in 3D vision. Existing methods typically establish a direct joint-to-joint mapping from 2D to 3D poses based on 2D features. This formulation suffers from two fundamental limitations: inevitable error propagation from input predicted 2D pose to 3D predictions and inherent difficulties in handling self-occlusion cases. In this paper, we propose PandaPose, a 3D human pose lifting approach via propagating 2D pose prior to 3D anchor space as the unified intermediate representation. Specifically, our 3D anchor space comprises: (1) Joint-wise 3D anchors in the canonical coordinate system, providing accurate and robust priors to mitigate 2D pose estimation inaccuracies. (2) Depth-aware joint-wise feature lifting that hierarchically integrates depth information to resolve self-occlusion ambiguities. (3) The anchor-feature interaction decoder that incorporates 3D anchors with lifted features to generate unified anchor queries encapsulating joint-wise 3D anchor set, visual cues and geometric depth information. The anchor queries are further employed to facilitate anchor-to-joint ensemble prediction. Experiments on three well-established benchmarks (i.e., Human3.6M, MPI-INF-3DHP and 3DPW) demonstrate the superiority of our proposition. The substantial reduction in error by $14.7\%$ compared to SOTA methods on the challenging conditions of Human3.6M and qualitative comparisons further showcase the effectiveness and robustness of our approach.

Related papers

HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation [61.32714172038278]
We propose a novel two-stage generative densification method, named Hierarchical Pose AutoRegressive Transformer (HiPART), to generate 2D dense poses from the original sparse 2D pose.<n>Specifically, we first develop a multi-scale skeleton tokenization module to quantize the highly dense 2D pose into hierarchical tokens and propose a Skeleton-aware Alignment to strengthen token connections.<n>With generated hierarchical poses as inputs for 2D-to-3D lifting, the proposed method shows strong robustness in occluded scenarios and achieves state-of-the-art performance on the single-frame-based 3D
arXiv Detail & Related papers (2025-03-30T06:15:36Z)
UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation. It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z)
A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose Estimation [18.72362803593654]
The dominant paradigm in 3D human pose estimation that lifts a 2D pose sequence to 3D heavily relies on long-term temporal clues. This can be attributed to their inherent inability to perceive spatial context as plain 2D joint coordinates carry no visual cues. We propose a straightforward yet powerful solution: leveraging the readily available intermediate visual representations produced by off-the-shelf (pre-trained) 2D pose detectors.
arXiv Detail & Related papers (2023-11-06T18:04:13Z)
Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis Aggregation [64.874000550443]
A Diffusion-based 3D Pose estimation (D3DP) method with Joint-wise reProjection-based Multi-hypothesis Aggregation (JPMA) is proposed. The proposed JPMA assembles multiple hypotheses generated by D3DP into a single 3D pose for practical use. Our method outperforms the state-of-the-art deterministic and probabilistic approaches by 1.5% and 8.9%, respectively.
arXiv Detail & Related papers (2023-03-21T04:00:47Z)
Shape-aware Multi-Person Pose Estimation from Multi-View Images [47.13919147134315]
Our proposed coarse-to-fine pipeline first aggregates noisy 2D observations from multiple camera views into 3D space. The final pose estimates are attained from a novel optimization scheme which links high-confidence multi-view 2D observations and 3D joint candidates.
arXiv Detail & Related papers (2021-10-05T20:04:21Z)
Deep Monocular 3D Human Pose Estimation via Cascaded Dimension-Lifting [10.336146336350811]
3D pose estimation from a single image is a challenging problem due to depth ambiguity. One type of the previous methods lifts 2D joints, obtained by resorting to external 2D pose detectors, to the 3D space. We propose a novel end-to-end framework that exploits the contextual information but also produces the output directly in the 3D space.
arXiv Detail & Related papers (2021-04-08T05:44:02Z)
Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views. We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z)
SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation [46.85865451812981]
We propose a novel system that first regresses a set of 2.5D representations of body parts and then reconstructs the 3D absolute poses based on these 2.5D representations with a depth-aware part association algorithm. Such a single-shot bottom-up scheme allows the system to better learn and reason about the inter-person depth relationship, improving both 3D and 2D pose estimation.
arXiv Detail & Related papers (2020-08-26T09:56:07Z)
Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach [76.10879433430466]
We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person's limbs. It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space. The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset.
arXiv Detail & Related papers (2020-03-25T00:26:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.