Related papers: On the Role of Rotation Equivariance in Monocular 3D Human Pose Estimation

On the Role of Rotation Equivariance in Monocular 3D Human Pose Estimation

URL: http://arxiv.org/abs/2601.13913v1
Date: Tue, 20 Jan 2026 12:41:08 GMT
Title: On the Role of Rotation Equivariance in Monocular 3D Human Pose Estimation
Authors: Pavlo Melnyk, Cuong Le, Urs Waldmann, Per-Erik Forssén, Bastian Wandt,
Abstract summary: We consider the monocular setting, i.e. single-view input, for 3D human pose estimation.<n>Here, the task is to predict a 3D point set of human skeletal joints from a single 2D input image.<n>We find that common lifting models fail when encountering a rotated input.
Score: 11.69147365913294
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Estimating 3D from 2D is one of the central tasks in computer vision. In this work, we consider the monocular setting, i.e. single-view input, for 3D human pose estimation (HPE). Here, the task is to predict a 3D point set of human skeletal joints from a single 2D input image. While by definition this is an ill-posed problem, recent work has presented methods that solve it with up to several-centimetre error. Typically, these methods employ a two-step approach, where the first step is to detect the 2D skeletal joints in the input image, followed by the step of 2D-to-3D lifting. We find that common lifting models fail when encountering a rotated input. We argue that learning a single human pose along with its in-plane rotations is considerably easier and more geometrically grounded than directly learning a point-to-point mapping. Furthermore, our intuition is that endowing the model with the notion of rotation equivariance without explicitly constraining its parameter space should lead to a more straightforward learning process than one with equivariance by design. Utilising the common HPE benchmarks, we confirm that the 2D rotation equivariance per se improves the model performance on human poses akin to rotations in the image plane, and can be efficiently and straightforwardly learned by augmentation, outperforming state-of-the-art equivariant-by-design methods.

Related papers

PandaPose: 3D Human Pose Lifting from a Single Image via Propagating 2D Pose Prior to 3D Anchor Space [62.10630827126755]
PandaPose is a 3D human pose lifting approach via propagating 2D pose prior to 3D anchor space as the unified intermediate representation.<n>Our 3D anchor space comprises: (1) Joint-wise 3D anchors in the canonical coordinate system, providing accurate and robust priors to mitigate 2D pose estimation inaccuracies.
arXiv Detail & Related papers (2026-02-01T08:20:40Z)
Efficient 2D to Full 3D Human Pose Uplifting including Joint Rotations [13.281936946796913]
We propose a novel 2D-to-3D uplifting model that directly estimates 3D human poses, including joint rotations, in a single forward pass.<n>Our models achieve state-of-the-art accuracy in rotation estimation, are 150 times faster than the IK-based approach, and surpass HMR models in joint localization precision.
arXiv Detail & Related papers (2025-04-14T07:32:28Z)
CameraHMR: Aligning People with Perspective [54.05758012879385]
We address the challenge of accurate 3D human pose and shape estimation from monocular images. Existing training datasets containing real images with pseudo ground truth (pGT) use SMPLify to fit SMPL to sparse 2D joint locations. We make two contributions that improve pGT accuracy.
arXiv Detail & Related papers (2024-11-12T19:12:12Z)
MPL: Lifting 3D Human Pose from Multi-view 2D Poses [75.26416079541723]
We propose combining 2D pose estimation, for which large and rich training datasets exist, and 2D-to-3D pose lifting, using a transformer-based network. Our experiments demonstrate decreases up to 45% in MPJPE errors compared to the 3D pose obtained by triangulating the 2D poses.
arXiv Detail & Related papers (2024-08-20T12:55:14Z)
DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion Probabilistic Model [25.223801390996435]
This paper focuses on reconstructing a 3D pose from a single 2D keypoint detection. We build a novel diffusion-based framework to effectively sample diverse 3D poses from an off-the-shelf 2D detector. We evaluate our method on the widely adopted Human3.6M and HumanEva-I datasets.
arXiv Detail & Related papers (2022-12-06T07:22:20Z)
Optimising 2D Pose Representation: Improve Accuracy, Stability and Generalisability Within Unsupervised 2D-3D Human Pose Estimation [7.294965109944706]
We show that the most optimal representation of a 2D pose is that of two independent segments, the torso and legs, with no shared features between each lifting network. Our results show that the most optimal representation of a 2D pose is that of two independent segments, the torso and legs, with no shared features between each lifting network.
arXiv Detail & Related papers (2022-09-01T17:32:52Z)
SVMA: A GAN-based model for Monocular 3D Human Pose Estimation [0.8379286663107844]
We present an unsupervised GAN-based model to recover 3D human pose from 2D joint locations extracted from a single image. Considering the reprojection constraint, our model can estimate the camera so that we can reproject the estimated 3D pose to the original 2D pose. Results on Human3.6M show that our method outperforms all the state-of-the-art methods, and results on MPI-INF-3DHP show that our method outperforms state-of-the-art by approximately 15.0%.
arXiv Detail & Related papers (2021-06-10T09:43:57Z)
Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image. The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images. We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z)
Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose [70.23652933572647]
We propose a novel graph convolutional neural network (GraphCNN)-based system that estimates the 3D coordinates of human mesh vertices directly from the 2D human pose. We show that our Pose2Mesh outperforms the previous 3D human pose and mesh estimation methods on various benchmark datasets.
arXiv Detail & Related papers (2020-08-20T16:01:56Z)
Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames. Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach [76.10879433430466]
We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person's limbs. It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space. The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset.
arXiv Detail & Related papers (2020-03-25T00:26:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.