Capturing the motion of every joint: 3D human pose and shape estimation
with independent tokens
- URL: http://arxiv.org/abs/2303.00298v1
- Date: Wed, 1 Mar 2023 07:48:01 GMT
- Title: Capturing the motion of every joint: 3D human pose and shape estimation
with independent tokens
- Authors: Sen Yang and Wen Heng and Gang Liu and Guozhong Luo and Wankou Yang
and Gang Yu
- Abstract summary: We present a novel method to estimate 3D human pose and shape from monocular videos.
The proposed method attains superior performances on the 3DPW and Human3.6M datasets.
- Score: 34.50928515515274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we present a novel method to estimate 3D human pose and shape
from monocular videos. This task requires directly recovering pixel-alignment
3D human pose and body shape from monocular images or videos, which is
challenging due to its inherent ambiguity. To improve precision, existing
methods highly rely on the initialized mean pose and shape as prior estimates
and parameter regression with an iterative error feedback manner. In addition,
video-based approaches model the overall change over the image-level features
to temporally enhance the single-frame feature, but fail to capture the
rotational motion at the joint level, and cannot guarantee local temporal
consistency. To address these issues, we propose a novel Transformer-based
model with a design of independent tokens. First, we introduce three types of
tokens independent of the image feature: \textit{joint rotation tokens, shape
token, and camera token}. By progressively interacting with image features
through Transformer layers, these tokens learn to encode the prior knowledge of
human 3D joint rotations, body shape, and position information from large-scale
data, and are updated to estimate SMPL parameters conditioned on a given image.
Second, benefiting from the proposed token-based representation, we further use
a temporal model to focus on capturing the rotational temporal information of
each joint, which is empirically conducive to preventing large jitters in local
parts. Despite being conceptually simple, the proposed method attains superior
performances on the 3DPW and Human3.6M datasets. Using ResNet-50 and
Transformer architectures, it obtains 42.0 mm error on the PA-MPJPE metric of
the challenging 3DPW, outperforming state-of-the-art counterparts by a large
margin. Code will be publicly available at
https://github.com/yangsenius/INT_HMR_Model
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.