Related papers: Context Modeling in 3D Human Pose Estimation: A Unified Perspective

Context Modeling in 3D Human Pose Estimation: A Unified Perspective

URL: http://arxiv.org/abs/2103.15507v2
Date: Tue, 30 Mar 2021 08:56:32 GMT
Title: Context Modeling in 3D Human Pose Estimation: A Unified Perspective
Authors: Xiaoxuan Ma, Jiajun Su, Chunyu Wang, Hai Ci and Yizhou Wang
Abstract summary: We present a general formula for context modeling in which both PSM and GNN are its special cases. By comparing the two methods, we found that the end-to-end training scheme in GNN and the limb length constraints in PSM are two complementary factors to improve results. We propose ContextPose based on attention mechanism that allows enforcing soft limb length constraints in a deep network.
Score: 27.36648656930247
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Estimating 3D human pose from a single image suffers from severe ambiguity since multiple 3D joint configurations may have the same 2D projection. The state-of-the-art methods often rely on context modeling methods such as pictorial structure model (PSM) or graph neural network (GNN) to reduce ambiguity. However, there is no study that rigorously compares them side by side. So we first present a general formula for context modeling in which both PSM and GNN are its special cases. By comparing the two methods, we found that the end-to-end training scheme in GNN and the limb length constraints in PSM are two complementary factors to improve results. To combine their advantages, we propose ContextPose based on attention mechanism that allows enforcing soft limb length constraints in a deep network. The approach effectively reduces the chance of getting absurd 3D pose estimates with incorrect limb lengths and achieves state-of-the-art results on two benchmark datasets. More importantly, the introduction of limb length constraints into deep networks enables the approach to achieve much better generalization performance.

Related papers

Adapting Human Mesh Recovery with Vision-Language Feedback [17.253535686451897]
We leverage vision-language models to generate interactive body part descriptions. We train a text encoder and a pose VQ-VAE, aligning texts to body poses in a shared latent space. The model can produce poses with accurate 3D perception and image consistency.
arXiv Detail & Related papers (2025-02-06T07:42:00Z)
ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation [54.86887812687023]
Most 3D-HPE methods rely on regression models, which assume a one-to-one mapping between inputs and outputs. We propose ManiPose, a novel manifold-constrained multi-hypothesis model capable of proposing multiple candidate 3D poses for each 2D input. Unlike previous multi-hypothesis approaches, our solution is completely supervised and does not rely on complex generative models.
arXiv Detail & Related papers (2023-12-11T13:50:10Z)
Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations. We derive suitable measures to quantify prediction uncertainty at both pose and joint level. We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z)
PONet: Robust 3D Human Pose Estimation via Learning Orientations Only [116.1502793612437]
We propose a novel Pose Orientation Net (PONet) that is able to robustly estimate 3D pose by learning orientations only. PONet estimates the 3D orientation of these limbs by taking advantage of the local image evidence to recover the 3D pose. We evaluate our method on multiple datasets, including Human3.6M, MPII, MPI-INF-3DHP, and 3DPW.
arXiv Detail & Related papers (2021-12-21T12:48:48Z)
3D Human Pose Estimation Based on 2D-3D Consistency with Synchronized Adversarial Training [5.306053507202384]
We propose a GAN-based model for 3D human pose estimation, in which a reprojection network is employed to learn the mapping of the distribution from 3D poses to 2D poses. Inspired by the typical kinematic chain space (KCS) matrix, we introduce a weighted KCS matrix and take it as one of the discriminator's inputs to impose joint angle and bone length constraints.
arXiv Detail & Related papers (2021-06-08T12:11:56Z)
A hybrid classification-regression approach for 3D hand pose estimation using graph convolutional networks [1.0152838128195467]
We propose a two-stage GCN-based framework that learns per-pose relationship constraints. The first phase quantizes the 2D/3D space to classify the joints into 2D/3D blocks based on their locality. The second stage uses a GCN-based module that uses an adaptative nearest neighbor algorithm to determine joint relationships.
arXiv Detail & Related papers (2021-05-23T10:09:10Z)
3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data [77.57798334776353]
We consider the problem of obtaining dense 3D reconstructions of humans from single and partially occluded views. We suggest that ambiguities can be modelled more effectively by parametrizing the possible body shapes and poses. We show that our method outperforms alternative approaches in ambiguous pose recovery on standard benchmarks for 3D humans.
arXiv Detail & Related papers (2020-11-02T13:55:31Z)
Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image. The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images. We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z)
Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods. Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances. In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z)
Monocular Human Pose and Shape Reconstruction using Part Differentiable Rendering [53.16864661460889]
Recent works succeed in regression-based methods which estimate parametric models directly through a deep neural network supervised by 3D ground truth. In this paper, we introduce body segmentation as critical supervision. To improve the reconstruction with part segmentation, we propose a part-level differentiable part that enables part-based models to be supervised by part segmentation.
arXiv Detail & Related papers (2020-03-24T14:25:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.