Capturing the motion of every joint: 3D human pose and shape estimation
with independent tokens
- URL: http://arxiv.org/abs/2303.00298v1
- Date: Wed, 1 Mar 2023 07:48:01 GMT
- Title: Capturing the motion of every joint: 3D human pose and shape estimation
with independent tokens
- Authors: Sen Yang and Wen Heng and Gang Liu and Guozhong Luo and Wankou Yang
and Gang Yu
- Abstract summary: We present a novel method to estimate 3D human pose and shape from monocular videos.
The proposed method attains superior performances on the 3DPW and Human3.6M datasets.
- Score: 34.50928515515274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we present a novel method to estimate 3D human pose and shape
from monocular videos. This task requires directly recovering pixel-alignment
3D human pose and body shape from monocular images or videos, which is
challenging due to its inherent ambiguity. To improve precision, existing
methods highly rely on the initialized mean pose and shape as prior estimates
and parameter regression with an iterative error feedback manner. In addition,
video-based approaches model the overall change over the image-level features
to temporally enhance the single-frame feature, but fail to capture the
rotational motion at the joint level, and cannot guarantee local temporal
consistency. To address these issues, we propose a novel Transformer-based
model with a design of independent tokens. First, we introduce three types of
tokens independent of the image feature: \textit{joint rotation tokens, shape
token, and camera token}. By progressively interacting with image features
through Transformer layers, these tokens learn to encode the prior knowledge of
human 3D joint rotations, body shape, and position information from large-scale
data, and are updated to estimate SMPL parameters conditioned on a given image.
Second, benefiting from the proposed token-based representation, we further use
a temporal model to focus on capturing the rotational temporal information of
each joint, which is empirically conducive to preventing large jitters in local
parts. Despite being conceptually simple, the proposed method attains superior
performances on the 3DPW and Human3.6M datasets. Using ResNet-50 and
Transformer architectures, it obtains 42.0 mm error on the PA-MPJPE metric of
the challenging 3DPW, outperforming state-of-the-art counterparts by a large
margin. Code will be publicly available at
https://github.com/yangsenius/INT_HMR_Model
Related papers
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape
Prediction [77.89935657608926]
We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images.
PF-LRM simultaneously estimates the relative camera poses in 1.3 seconds on a single A100 GPU.
arXiv Detail & Related papers (2023-11-20T18:57:55Z) - Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video [23.93644678238666]
We propose a Pose and Mesh Co-Evolution network (PMCE) to recover 3D human motion from a video.
The proposed PMCE outperforms previous state-of-the-art methods in terms of both per-frame accuracy and temporal consistency.
arXiv Detail & Related papers (2023-08-20T16:03:21Z) - IVT: An End-to-End Instance-guided Video Transformer for 3D Pose
Estimation [6.270047084514142]
Video 3D human pose estimation aims to localize the 3D coordinates of human joints from videos.
IVT enables learningtemporal contextual depth information from visual features and 3D poses directly from video frames.
Experiments on three widely-used 3D pose estimation benchmarks show that the proposed IVT achieves state-of-the-art performances.
arXiv Detail & Related papers (2022-08-06T02:36:33Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - NeuralReshaper: Single-image Human-body Retouching with Deep Neural
Networks [50.40798258968408]
We present NeuralReshaper, a novel method for semantic reshaping of human bodies in single images using deep generative networks.
Our approach follows a fit-then-reshape pipeline, which first fits a parametric 3D human model to a source human image.
To deal with the lack-of-data problem that no paired data exist, we introduce a novel self-supervised strategy to train our network.
arXiv Detail & Related papers (2022-03-20T09:02:13Z) - 3D Human Pose Estimation with Spatial and Temporal Transformers [59.433208652418976]
We present PoseFormer, a purely transformer-based approach for 3D human pose estimation in videos.
Inspired by recent developments in vision transformers, we design a spatial-temporal transformer structure.
We quantitatively and qualitatively evaluate our method on two popular and standard benchmark datasets.
arXiv Detail & Related papers (2021-03-18T18:14:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.