Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape
Estimation from Monocular Video
- URL: http://arxiv.org/abs/2203.08534v1
- Date: Wed, 16 Mar 2022 11:00:24 GMT
- Title: Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape
Estimation from Monocular Video
- Authors: Wen-Li Wei, Jen-Chun Lin, Tyng-Luh Liu, and Hong-Yuan Mark Liao
- Abstract summary: We propose a motion pose and shape network (MPS-Net) to capture humans in motion to estimate 3D human pose and shape from a video.
Specifically, we first propose a motion continuity attention (MoCA) module that leverages visual cues observed from human motion to adaptively recalibrate the range that needs attention in the sequence.
By coupling the MoCA and HAFI modules, the proposed MPS-Net excels in estimating 3D human pose and shape in the video.
- Score: 24.217269857183233
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning to capture human motion is essential to 3D human pose and shape
estimation from monocular video. However, the existing methods mainly rely on
recurrent or convolutional operation to model such temporal information, which
limits the ability to capture non-local context relations of human motion. To
address this problem, we propose a motion pose and shape network (MPS-Net) to
effectively capture humans in motion to estimate accurate and temporally
coherent 3D human pose and shape from a video. Specifically, we first propose a
motion continuity attention (MoCA) module that leverages visual cues observed
from human motion to adaptively recalibrate the range that needs attention in
the sequence to better capture the motion continuity dependencies. Then, we
develop a hierarchical attentive feature integration (HAFI) module to
effectively combine adjacent past and future feature representations to
strengthen temporal correlation and refine the feature representation of the
current frame. By coupling the MoCA and HAFI modules, the proposed MPS-Net
excels in estimating 3D human pose and shape in the video. Though conceptually
simple, our MPS-Net not only outperforms the state-of-the-art methods on the
3DPW, MPI-INF-3DHP, and Human3.6M benchmark datasets, but also uses fewer
network parameters. The video demos can be found at
https://mps-net.github.io/MPS-Net/.
Related papers
- HMP: Hand Motion Priors for Pose and Shape Estimation from Video [52.39020275278984]
We develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions.
Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios.
We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets.
arXiv Detail & Related papers (2023-12-27T22:35:33Z) - Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video [23.93644678238666]
We propose a Pose and Mesh Co-Evolution network (PMCE) to recover 3D human motion from a video.
The proposed PMCE outperforms previous state-of-the-art methods in terms of both per-frame accuracy and temporal consistency.
arXiv Detail & Related papers (2023-08-20T16:03:21Z) - MotionBERT: A Unified Perspective on Learning Human Motion
Representations [46.67364057245364]
We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources.
We propose a pretraining stage in which a motion encoder is trained to recover the underlying 3D motion from noisy partial 2D observations.
We implement motion encoder with a Dual-stream Spatio-temporal Transformer (DSTformer) neural network.
arXiv Detail & Related papers (2022-10-12T19:46:25Z) - Multi-level Motion Attention for Human Motion Prediction [132.29963836262394]
We study the use of different types of attention, computed at joint, body part, and full pose levels.
Our experiments on Human3.6M, AMASS and 3DPW validate the benefits of our approach for both periodical and non-periodical actions.
arXiv Detail & Related papers (2021-06-17T08:08:11Z) - HuMoR: 3D Human Motion Model for Robust Pose Estimation [100.55369985297797]
HuMoR is a 3D Human Motion Model for Robust Estimation of temporal pose and shape.
We introduce a conditional variational autoencoder, which learns a distribution of the change in pose at each step of a motion sequence.
We demonstrate that our model generalizes to diverse motions and body shapes after training on a large motion capture dataset.
arXiv Detail & Related papers (2021-05-10T21:04:55Z) - Self-Attentive 3D Human Pose and Shape Estimation from Videos [82.63503361008607]
We present a video-based learning algorithm for 3D human pose and shape estimation.
We exploit temporal information in videos and propose a self-attention module.
We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets.
arXiv Detail & Related papers (2021-03-26T00:02:19Z) - History Repeats Itself: Human Motion Prediction via Motion Attention [81.94175022575966]
We introduce an attention-based feed-forward network that explicitly leverages the observation that human motion tends to repeat itself.
In particular, we propose to extract motion attention to capture the similarity between the current motion context and the historical motion sub-sequences.
Our experiments on Human3.6M, AMASS and 3DPW evidence the benefits of our approach for both periodical and non-periodical actions.
arXiv Detail & Related papers (2020-07-23T02:12:27Z) - Motion Guided 3D Pose Estimation from Videos [81.14443206968444]
We propose a new loss function, called motion loss, for the problem of monocular 3D Human pose estimation from 2D pose.
In computing motion loss, a simple yet effective representation for keypoint motion, called pairwise motion encoding, is introduced.
We design a new graph convolutional network architecture, U-shaped GCN (UGCN), which captures both short-term and long-term motion information.
arXiv Detail & Related papers (2020-04-29T06:59:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.