SoloPose: One-Shot Kinematic 3D Human Pose Estimation with Video Data
Augmentation
- URL: http://arxiv.org/abs/2312.10195v1
- Date: Fri, 15 Dec 2023 20:45:04 GMT
- Title: SoloPose: One-Shot Kinematic 3D Human Pose Estimation with Video Data
Augmentation
- Authors: David C. Jeong, Hongji Liu, Saunder Salazar, Jessie Jiang, Christopher
A. Kitts
- Abstract summary: SoloPose is a one-shot, many-to-many-temporal transformer model for kinematic 3D human pose estimation of video.
HeatPose is a 3D heatmap based on Gaussian Mixture Model distributions that factors target key points as well as kinematically adjacent key points.
3D AugMotion Toolkit is a methodology to augment existing 3D human pose datasets.
- Score: 0.4218593777811082
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While recent two-stage many-to-one deep learning models have demonstrated
great success in 3D human pose estimation, such models are inefficient ways to
detect 3D key points in a sequential video relative to one-shot and
many-to-many models. Another key drawback of two-stage and many-to-one models
is that errors in the first stage will be passed onto the second stage. In this
paper, we introduce SoloPose, a novel one-shot, many-to-many spatio-temporal
transformer model for kinematic 3D human pose estimation of video. SoloPose is
further fortified by HeatPose, a 3D heatmap based on Gaussian Mixture Model
distributions that factors target key points as well as kinematically adjacent
key points. Finally, we address data diversity constraints with the 3D
AugMotion Toolkit, a methodology to augment existing 3D human pose datasets,
specifically by projecting four top public 3D human pose datasets (Humans3.6M,
MADS, AIST Dance++, MPI INF 3DHP) into a novel dataset (Humans7.1M) with a
universal coordinate system. Extensive experiments are conducted on Human3.6M
as well as the augmented Humans7.1M dataset, and SoloPose demonstrates superior
results relative to the state-of-the-art approaches.
Related papers
- UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture [70.59984501516084]
UnrealEgo is a new large-scale naturalistic dataset for egocentric 3D human pose estimation.
It is based on an advanced concept of eyeglasses equipped with two fisheye cameras that can be used in unconstrained environments.
We propose a new benchmark method with a simple but effective idea of devising a 2D keypoint estimation module for stereo inputs to improve 3D human pose estimation.
arXiv Detail & Related papers (2022-08-02T17:59:54Z) - AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation by
Learnable Motion Generation [24.009674750548303]
Testing a pre-trained 3D pose estimator on a new dataset results in a major performance drop.
We propose AdaptPose, an end-to-end framework that generates synthetic 3D human motions from a source dataset.
Our method outperforms previous work in cross-dataset evaluations by 14% and previous semi-supervised learning methods that use partial 3D annotations by 16%.
arXiv Detail & Related papers (2021-12-22T00:27:52Z) - Shape-aware Multi-Person Pose Estimation from Multi-View Images [47.13919147134315]
Our proposed coarse-to-fine pipeline first aggregates noisy 2D observations from multiple camera views into 3D space.
The final pose estimates are attained from a novel optimization scheme which links high-confidence multi-view 2D observations and 3D joint candidates.
arXiv Detail & Related papers (2021-10-05T20:04:21Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - PandaNet : Anchor-Based Single-Shot Multi-Person 3D Pose Estimation [35.791868530073955]
We present PandaNet, a new single-shot, anchor-based and multi-person 3D pose estimation approach.
The proposed model performs bounding box detection and, for each detected person, 2D and 3D pose regression into a single forward pass.
It does not need any post-processing to regroup joints since the network predicts a full 3D pose for each bounding box.
arXiv Detail & Related papers (2021-01-07T10:32:17Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh
Recovery from a 2D Human Pose [70.23652933572647]
We propose a novel graph convolutional neural network (GraphCNN)-based system that estimates the 3D coordinates of human mesh vertices directly from the 2D human pose.
We show that our Pose2Mesh outperforms the previous 3D human pose and mesh estimation methods on various benchmark datasets.
arXiv Detail & Related papers (2020-08-20T16:01:56Z) - Cascaded deep monocular 3D human pose estimation with evolutionary
training data [76.3478675752847]
Deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation.
This paper proposes a novel data augmentation method that is scalable for massive amount of training data.
Our method synthesizes unseen 3D human skeletons based on a hierarchical human representation and synthesizings inspired by prior knowledge.
arXiv Detail & Related papers (2020-06-14T03:09:52Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - AnimePose: Multi-person 3D pose estimation and animation [9.323689681059504]
3D animation of humans in action is quite challenging as it involves using a huge setup with several motion trackers all over the person's body to track the movements of every limb.
This is time-consuming and may cause the person discomfort in wearing exoskeleton body suits with motion sensors.
We present a solution to generate 3D animation of multiple persons from a 2D video using deep learning.
arXiv Detail & Related papers (2020-02-06T11:11:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.