Towards Robust and Expressive Whole-body Human Pose and Shape Estimation
- URL: http://arxiv.org/abs/2312.08730v1
- Date: Thu, 14 Dec 2023 08:17:42 GMT
- Title: Towards Robust and Expressive Whole-body Human Pose and Shape Estimation
- Authors: Hui EnPang and Zhongang Cai and Lei Yang and Qingyi Tao and Zhonghua
Wu and Tianwei Zhang and Ziwei Liu
- Abstract summary: Whole-body pose and shape estimation aims to jointly predict different behaviors of the entire human body from a monocular image.
Existing methods often exhibit degraded performance under the complexity of in-the-wild scenarios.
We propose a novel framework to enhance the robustness of whole-body pose and shape estimation.
- Score: 51.457517178632756
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Whole-body pose and shape estimation aims to jointly predict different
behaviors (e.g., pose, hand gesture, facial expression) of the entire human
body from a monocular image. Existing methods often exhibit degraded
performance under the complexity of in-the-wild scenarios. We argue that the
accuracy and reliability of these models are significantly affected by the
quality of the predicted \textit{bounding box}, e.g., the scale and alignment
of body parts. The natural discrepancy between the ideal bounding box
annotations and model detection results is particularly detrimental to the
performance of whole-body pose and shape estimation. In this paper, we propose
a novel framework to enhance the robustness of whole-body pose and shape
estimation. Our framework incorporates three new modules to address the above
challenges from three perspectives: \textbf{1) Localization Module} enhances
the model's awareness of the subject's location and semantics within the image
space. \textbf{2) Contrastive Feature Extraction Module} encourages the model
to be invariant to robust augmentations by incorporating contrastive loss with
dedicated positive samples. \textbf{3) Pixel Alignment Module} ensures the
reprojected mesh from the predicted camera and body model parameters are
accurate and pixel-aligned. We perform comprehensive experiments to demonstrate
the effectiveness of our proposed framework on body, hands, face and whole-body
benchmarks. Codebase is available at
\url{https://github.com/robosmplx/robosmplx}.
Related papers
- UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - A Simple Strategy for Body Estimation from Partial-View Images [8.05538560322898]
Virtual try-on and product personalization have become increasingly important in modern online shopping, highlighting the need for accurate body measurement estimation.
Previous research has advanced in estimating 3D body shapes from RGB images, but the task is inherently ambiguous as the observed scale of human subjects in the images depends on two unknown factors: capture distance and body dimensions.
We propose a modular and simple height normalization solution, which relocates the subject skeleton to the desired position, normalizing the scale and disentangling the relationship between the two variables.
arXiv Detail & Related papers (2024-04-14T16:55:23Z) - A Stochastic-Geometrical Framework for Object Pose Estimation based on Mixture Models Avoiding the Correspondence Problem [0.0]
This paper presents a novel-geometrical modeling framework for object pose estimation based on observing multiple feature points.
Probabilistic modeling utilizing mixture models shows the potential for accurate and robust pose estimations.
arXiv Detail & Related papers (2023-11-29T21:45:33Z) - The Best of Both Worlds: Combining Model-based and Nonparametric
Approaches for 3D Human Body Estimation [20.797162096899154]
We propose a framework for estimating model parameters from global image features.
A dense map prediction module explicitly establishes the dense UV correspondence between the image evidence and each part of the body model.
In inverse kinematics module refines the key point prediction and generates a posed template mesh.
A UV inpainting module relies on the corresponding feature, prediction and the posed template, and completes the predictions of occluded body shape.
arXiv Detail & Related papers (2022-05-01T16:39:09Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - PaMIR: Parametric Model-Conditioned Implicit Representation for
Image-based Human Reconstruction [67.08350202974434]
We propose Parametric Model-Conditioned Implicit Representation (PaMIR), which combines the parametric body model with the free-form deep implicit function.
We show that our method achieves state-of-the-art performance for image-based 3D human reconstruction in the cases of challenging poses and clothing types.
arXiv Detail & Related papers (2020-07-08T02:26:19Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.