"Teaching Independent Parts Separately"(TIPSy-GAN) : Improving Accuracy
and Stability in Unsupervised Adversarial 2D to 3D Human Pose Estimation
- URL: http://arxiv.org/abs/2205.05980v2
- Date: Mon, 16 May 2022 12:39:44 GMT
- Title: "Teaching Independent Parts Separately"(TIPSy-GAN) : Improving Accuracy
and Stability in Unsupervised Adversarial 2D to 3D Human Pose Estimation
- Authors: Peter Hardy and Srinandan Dasmahapatra and Hansung Kim
- Abstract summary: We present TIPSy-GAN, a new approach to improve the accuracy and stability in unsupervised adversarial 2D to 3D human pose estimation.
In our work we demonstrate that the human kinematic skeleton should not be assumed as one spatially codependent structure.
- Score: 7.294965109944706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present TIPSy-GAN, a new approach to improve the accuracy and stability in
unsupervised adversarial 2D to 3D human pose estimation. In our work we
demonstrate that the human kinematic skeleton should not be assumed as one
spatially codependent structure. In fact, we believe when a full 2D pose is
provided during training, there is an inherent bias learned where the 3D
coordinate of a keypoint is spatially codependent on the 2D locations of all
other keypoints. To investigate our theory we follow previous adversarial
approaches but train two generators on spatially independent parts of the
kinematic skeleton, the torso and the legs. We find that improving the 2D
reprojection self-consistency cycle is key to lowering the evaluation error and
therefore introduce new consistency constraints during training. A TIPSy is
produced model via knowledge distillation from these generators which can
predict the 3D coordinates for the entire 2D pose with improved results.
Furthermore, we address the question left unanswered in prior work detailing
how long to train for a truly unsupervised scenario. We show that two
independent generators training adversarially has improved stability than that
of a solo generator which will collapse due to the adversarial network becoming
unstable. TIPSy decreases the average error by 18% when compared to that of a
baseline solo generator. TIPSy improves upon other unsupervised approaches
while also performing strongly against supervised and weakly-supervised
approaches during evaluation on both the Human3.6M and MPI-INF-3DHP dataset.
Related papers
- Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs [15.017274891943162]
Temporal 3D human pose estimation from monocular videos is a challenging task in human-centered computer vision.
Inertial sensor has been introduced to provide complementary source of information.
It remains challenging to integrate heterogeneous sensor data for producing physically rational 3D human poses.
arXiv Detail & Related papers (2024-04-27T09:02:42Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - LInKs "Lifting Independent Keypoints" -- Partial Pose Lifting for
Occlusion Handling with Improved Accuracy in 2D-3D Human Pose Estimation [4.648549457266638]
We present LInKs, a novel unsupervised learning method to recover 3D human poses from 2D kinematic skeletons.
Our approach follows a unique two-step process, which involves first lifting the occluded 2D pose to the 3D domain.
This lift-then-fill approach leads to significantly more accurate results compared to models that complete the pose in 2D space alone.
arXiv Detail & Related papers (2023-09-13T18:28:04Z) - Optimising 2D Pose Representation: Improve Accuracy, Stability and
Generalisability Within Unsupervised 2D-3D Human Pose Estimation [7.294965109944706]
We show that the most optimal representation of a 2D pose is that of two independent segments, the torso and legs, with no shared features between each lifting network.
Our results show that the most optimal representation of a 2D pose is that of two independent segments, the torso and legs, with no shared features between each lifting network.
arXiv Detail & Related papers (2022-09-01T17:32:52Z) - Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose
Estimation [63.199549837604444]
3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision.
We cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target.
We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-05T03:52:57Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and
Hallucination under Self-supervision [102.48681650013698]
Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions to guide the learning.
We propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision.
This is made possible via introducing a reinforcement-learning-based imitator, which is learned jointly with a pose estimator alongside a pose hallucinator.
arXiv Detail & Related papers (2022-03-29T14:45:53Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Multi-Scale Networks for 3D Human Pose Estimation with Inference Stage
Optimization [33.02708860641971]
Estimating 3D human poses from a monocular video is still a challenging task.
Many existing methods drop when the target person is cluded by other objects, or the motion is too fast/slow relative to the scale and speed of the training data.
We introduce atemporal-temporal network for robust 3D human pose estimation.
arXiv Detail & Related papers (2020-10-13T15:24:28Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.