Beyond Weak Perspective for Monocular 3D Human Pose Estimation
- URL: http://arxiv.org/abs/2009.06549v1
- Date: Mon, 14 Sep 2020 16:23:14 GMT
- Title: Beyond Weak Perspective for Monocular 3D Human Pose Estimation
- Authors: Imry Kissos, Lior Fritz, Matan Goldman, Omer Meir, Eduard Oks and Mark
Kliger
- Abstract summary: We consider the task of 3D joints location and orientation prediction from a monocular video.
We first infer 2D joints locations with an off-the-shelf pose estimation algorithm.
We then adhere to the SMPLify algorithm which receives those initial parameters.
- Score: 6.883305568568084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the task of 3D joints location and orientation prediction from a
monocular video with the skinned multi-person linear (SMPL) model. We first
infer 2D joints locations with an off-the-shelf pose estimation algorithm. We
use the SPIN algorithm and estimate initial predictions of body pose, shape and
camera parameters from a deep regression neural network. We then adhere to the
SMPLify algorithm which receives those initial parameters, and optimizes them
so that inferred 3D joints from the SMPL model would fit the 2D joints
locations. This algorithm involves a projection step of 3D joints to the 2D
image plane. The conventional approach is to follow weak perspective
assumptions which use ad-hoc focal length. Through experimentation on the 3D
Poses in the Wild (3DPW) dataset, we show that using full perspective
projection, with the correct camera center and an approximated focal length,
provides favorable results. Our algorithm has resulted in a winning entry for
the 3DPW Challenge, reaching first place in joints orientation accuracy.
Related papers
- CameraHMR: Aligning People with Perspective [54.05758012879385]
We address the challenge of accurate 3D human pose and shape estimation from monocular images.
Existing training datasets containing real images with pseudo ground truth (pGT) use SMPLify to fit SMPL to sparse 2D joint locations.
We make two contributions that improve pGT accuracy.
arXiv Detail & Related papers (2024-11-12T19:12:12Z) - Neural Voting Field for Camera-Space 3D Hand Pose Estimation [106.34750803910714]
We present a unified framework for camera-space 3D hand pose estimation from a single RGB image based on 3D implicit representation.
We propose a novel unified 3D dense regression scheme to estimate camera-space 3D hand pose via dense 3D point-wise voting in camera frustum.
arXiv Detail & Related papers (2023-05-07T16:51:34Z) - Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis
Aggregation [64.874000550443]
A Diffusion-based 3D Pose estimation (D3DP) method with Joint-wise reProjection-based Multi-hypothesis Aggregation (JPMA) is proposed.
The proposed JPMA assembles multiple hypotheses generated by D3DP into a single 3D pose for practical use.
Our method outperforms the state-of-the-art deterministic and probabilistic approaches by 1.5% and 8.9%, respectively.
arXiv Detail & Related papers (2023-03-21T04:00:47Z) - Shape-aware Multi-Person Pose Estimation from Multi-View Images [47.13919147134315]
Our proposed coarse-to-fine pipeline first aggregates noisy 2D observations from multiple camera views into 3D space.
The final pose estimates are attained from a novel optimization scheme which links high-confidence multi-view 2D observations and 3D joint candidates.
arXiv Detail & Related papers (2021-10-05T20:04:21Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - On the role of depth predictions for 3D human pose estimation [0.04199844472131921]
We build a system that takes 2d joint locations as input along with their estimated depth value and predicts their 3d positions in camera coordinates.
Results are produced on neural network that accepts a low dimensional input and be integrated into a real-time system.
Our system can be combined with an off-the-shelf 2d pose detector and a depth map predictor to perform 3d pose estimation in the wild.
arXiv Detail & Related papers (2021-03-03T16:51:38Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A
Geometric Approach [76.10879433430466]
We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person's limbs.
It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space.
The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset.
arXiv Detail & Related papers (2020-03-25T00:26:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.