Synthetic Training for Monocular Human Mesh Recovery
- URL: http://arxiv.org/abs/2010.14036v1
- Date: Tue, 27 Oct 2020 03:31:35 GMT
- Title: Synthetic Training for Monocular Human Mesh Recovery
- Authors: Yu Sun and Qian Bao and Wu Liu and Wenpeng Gao and Yili Fu and Chuang
Gan and Tao Mei
- Abstract summary: This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
- Score: 100.38109761268639
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recovering 3D human mesh from monocular images is a popular topic in computer
vision and has a wide range of applications. This paper aims to estimate 3D
mesh of multiple body parts (e.g., body, hands) with large-scale differences
from a single RGB image. Existing methods are mostly based on iterative
optimization, which is very time-consuming. We propose to train a single-shot
model to achieve this goal. The main challenge is lacking training data that
have complete 3D annotations of all body parts in 2D images. To solve this
problem, we design a multi-branch framework to disentangle the regression of
different body properties, enabling us to separate each component's training in
a synthetic training manner using unpaired data available. Besides, to
strengthen the generalization ability, most existing methods have used
in-the-wild 2D pose datasets to supervise the estimated 3D pose via 3D-to-2D
projection. However, we observe that the commonly used weak-perspective model
performs poorly in dealing with the external foreshortening effect of camera
projection. Therefore, we propose a depth-to-scale (D2S) projection to
incorporate the depth difference into the projection function to derive
per-joint scale variants for more proper supervision. The proposed method
outperforms previous methods on the CMU Panoptic Studio dataset according to
the evaluation results and achieves comparable results on the Human3.6M body
and STB hand benchmarks. More impressively, the performance in close shot
images gets significantly improved using the proposed D2S projection for weak
supervision, while maintains obvious superiority in computational efficiency.
Related papers
- Two Views Are Better than One: Monocular 3D Pose Estimation with Multiview Consistency [0.493599216374976]
We propose a novel loss function, multiview consistency, to enable adding additional training data with only 2D supervision.
Our experiments demonstrate that two views offset by 90 degrees are enough to obtain good performance, with only marginal improvements by adding more views.
This research introduces new possibilities for domain adaptation in 3D pose estimation, providing a practical and cost-effective solution to customize models for specific applications.
arXiv Detail & Related papers (2023-11-21T08:21:55Z) - 3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose
Estimation [28.24765523800196]
We propose 3D-aware Neural Body Fitting (3DNBF) for 3D human pose estimation.
In particular, we propose a generative model of deep features based on a volumetric human representation with Gaussian ellipsoidal kernels emitting 3D pose-dependent feature vectors.
The neural features are trained with contrastive learning to become 3D-aware and hence to overcome the 2D-3D ambiguity.
arXiv Detail & Related papers (2023-08-19T22:41:00Z) - DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion
Probabilistic Model [25.223801390996435]
This paper focuses on reconstructing a 3D pose from a single 2D keypoint detection.
We build a novel diffusion-based framework to effectively sample diverse 3D poses from an off-the-shelf 2D detector.
We evaluate our method on the widely adopted Human3.6M and HumanEva-I datasets.
arXiv Detail & Related papers (2022-12-06T07:22:20Z) - Optimising 2D Pose Representation: Improve Accuracy, Stability and
Generalisability Within Unsupervised 2D-3D Human Pose Estimation [7.294965109944706]
We show that the most optimal representation of a 2D pose is that of two independent segments, the torso and legs, with no shared features between each lifting network.
Our results show that the most optimal representation of a 2D pose is that of two independent segments, the torso and legs, with no shared features between each lifting network.
arXiv Detail & Related papers (2022-09-01T17:32:52Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and
Hallucination under Self-supervision [102.48681650013698]
Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions to guide the learning.
We propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision.
This is made possible via introducing a reinforcement-learning-based imitator, which is learned jointly with a pose estimator alongside a pose hallucinator.
arXiv Detail & Related papers (2022-03-29T14:45:53Z) - PONet: Robust 3D Human Pose Estimation via Learning Orientations Only [116.1502793612437]
We propose a novel Pose Orientation Net (PONet) that is able to robustly estimate 3D pose by learning orientations only.
PONet estimates the 3D orientation of these limbs by taking advantage of the local image evidence to recover the 3D pose.
We evaluate our method on multiple datasets, including Human3.6M, MPII, MPI-INF-3DHP, and 3DPW.
arXiv Detail & Related papers (2021-12-21T12:48:48Z) - Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows [24.0966076588569]
We propose a normalizing flow based method that exploits the deterministic 3D-to-2D mapping to solve the ambiguous inverse 2D-to-3D problem.
We evaluate our approach on the two benchmark datasets Human3.6M and MPI-INF-3DHP, outperforming all comparable methods in most metrics.
arXiv Detail & Related papers (2021-07-29T07:33:14Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.