3D Human Pose Estimation Based on 2D-3D Consistency with Synchronized
Adversarial Training
- URL: http://arxiv.org/abs/2106.04274v4
- Date: Tue, 5 Mar 2024 10:01:34 GMT
- Title: 3D Human Pose Estimation Based on 2D-3D Consistency with Synchronized
Adversarial Training
- Authors: Yicheng Deng, Cheng Sun, Yongqi Sun, and Jiahui Zhu
- Abstract summary: We propose a GAN-based model for 3D human pose estimation, in which a reprojection network is employed to learn the mapping of the distribution from 3D poses to 2D poses.
Inspired by the typical kinematic chain space (KCS) matrix, we introduce a weighted KCS matrix and take it as one of the discriminator's inputs to impose joint angle and bone length constraints.
- Score: 5.306053507202384
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: 3D human pose estimation from a single image is still a challenging problem
despite the large amount of work that has been performed in this field.
Generally, most methods directly use neural networks and ignore certain
constraints (e.g., reprojection constraints, joint angle, and bone length
constraints). While a few methods consider these constraints but train the
network separately, they cannot effectively solve the depth ambiguity problem.
In this paper, we propose a GAN-based model for 3D human pose estimation, in
which a reprojection network is employed to learn the mapping of the
distribution from 3D poses to 2D poses, and a discriminator is employed for
2D-3D consistency discrimination. We adopt a novel strategy to synchronously
train the generator, the reprojection network and the discriminator.
Furthermore, inspired by the typical kinematic chain space (KCS) matrix, we
introduce a weighted KCS matrix and take it as one of the discriminator's
inputs to impose joint angle and bone length constraints. The experimental
results on Human3.6M show that our method significantly outperforms
state-of-the-art methods in most cases.
Related papers
- JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human
Mesh Recovery [84.67823511418334]
This paper presents 3D JOint contrastive learning with TRansformers framework for handling occluded 3D human mesh recovery.
Our method includes an encoder-decoder transformer architecture to fuse 2D and 3D representations for achieving 2D$&$3D aligned results.
arXiv Detail & Related papers (2023-07-31T02:58:58Z) - DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion
Probabilistic Model [25.223801390996435]
This paper focuses on reconstructing a 3D pose from a single 2D keypoint detection.
We build a novel diffusion-based framework to effectively sample diverse 3D poses from an off-the-shelf 2D detector.
We evaluate our method on the widely adopted Human3.6M and HumanEva-I datasets.
arXiv Detail & Related papers (2022-12-06T07:22:20Z) - On Triangulation as a Form of Self-Supervision for 3D Human Pose
Estimation [57.766049538913926]
Supervised approaches to 3D pose estimation from single images are remarkably effective when labeled data is abundant.
Much of the recent attention has shifted towards semi and (or) weakly supervised learning.
We propose to impose multi-view geometrical constraints by means of a differentiable triangulation and to use it as form of self-supervision during training when no labels are available.
arXiv Detail & Related papers (2022-03-29T19:11:54Z) - Context Modeling in 3D Human Pose Estimation: A Unified Perspective [27.36648656930247]
We present a general formula for context modeling in which both PSM and GNN are its special cases.
By comparing the two methods, we found that the end-to-end training scheme in GNN and the limb length constraints in PSM are two complementary factors to improve results.
We propose ContextPose based on attention mechanism that allows enforcing soft limb length constraints in a deep network.
arXiv Detail & Related papers (2021-03-29T11:26:03Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Multi-Scale Networks for 3D Human Pose Estimation with Inference Stage
Optimization [33.02708860641971]
Estimating 3D human poses from a monocular video is still a challenging task.
Many existing methods drop when the target person is cluded by other objects, or the motion is too fast/slow relative to the scale and speed of the training data.
We introduce atemporal-temporal network for robust 3D human pose estimation.
arXiv Detail & Related papers (2020-10-13T15:24:28Z) - Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation [52.94078950641959]
We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation.
We adopt a novel neural representation of multi-person 3D pose which unifies the position of person instances with their corresponding 3D pose representation.
We propose a practical deployment paradigm where paired 2D or 3D pose annotations are unavailable.
arXiv Detail & Related papers (2020-08-04T07:54:25Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z) - 3D Human Pose Estimation using Spatio-Temporal Networks with Explicit
Occlusion Training [40.933783830017035]
Estimating 3D poses from a monocular task is still a challenging task, despite the significant progress that has been made in recent years.
We introduce a-temporal video network for robust 3D human pose estimation.
We apply multi-scale spatial features for 2D joints or keypoints prediction in each individual frame, and multistride temporal convolutional net-works (TCNs) to estimate 3D joints or keypoints.
arXiv Detail & Related papers (2020-04-07T09:12:12Z) - Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A
Geometric Approach [76.10879433430466]
We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person's limbs.
It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space.
The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset.
arXiv Detail & Related papers (2020-03-25T00:26:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.