Multitask Non-Autoregressive Model for Human Motion Prediction
- URL: http://arxiv.org/abs/2007.06426v1
- Date: Mon, 13 Jul 2020 15:00:19 GMT
- Title: Multitask Non-Autoregressive Model for Human Motion Prediction
- Authors: Bin Li, Jian Tian, Zhongfei Zhang, Hailin Feng, and Xi Li
- Abstract summary: Non-auToregressive Model (NAT) is proposed with a complete non-autoregressive decoding scheme, as well as a context encoder and a positional encoding module.
Our approach is evaluated on Human3.6M and CMU-Mocap benchmarks and outperforms state-of-the-art autoregressive methods.
- Score: 33.98939145212708
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human motion prediction, which aims at predicting future human skeletons
given the past ones, is a typical sequence-to-sequence problem. Therefore,
extensive efforts have been continued on exploring different RNN-based
encoder-decoder architectures. However, by generating target poses conditioned
on the previously generated ones, these models are prone to bringing issues
such as error accumulation problem. In this paper, we argue that such issue is
mainly caused by adopting autoregressive manner. Hence, a novel
Non-auToregressive Model (NAT) is proposed with a complete non-autoregressive
decoding scheme, as well as a context encoder and a positional encoding module.
More specifically, the context encoder embeds the given poses from temporal and
spatial perspectives. The frame decoder is responsible for predicting each
future pose independently. The positional encoding module injects positional
signal into the model to indicate temporal order. Moreover, a multitask
training paradigm is presented for both low-level human skeleton prediction and
high-level human action recognition, resulting in the convincing improvement
for the prediction task. Our approach is evaluated on Human3.6M and CMU-Mocap
benchmarks and outperforms state-of-the-art autoregressive methods.
Related papers
- Non-autoregressive Sequence-to-Sequence Vision-Language Models [63.77614880533488]
We propose a parallel decoding sequence-to-sequence vision-language model that marginalizes over multiple inference paths in the decoder.
The model achieves performance on-par with its state-of-the-art autoregressive counterpart, but is faster at inference time.
arXiv Detail & Related papers (2024-03-04T17:34:59Z) - Multiscale Residual Learning of Graph Convolutional Sequence Chunks for
Human Motion Prediction [23.212848643552395]
A new method is proposed for human motion prediction by learning temporal and spatial dependencies.
Our proposed method is able to effectively model the sequence information for motion prediction and outperform other techniques to set a new state-of-the-art.
arXiv Detail & Related papers (2023-08-31T15:23:33Z) - Spatio-Temporal Encoding of Brain Dynamics with Surface Masked Autoencoders [10.097983222759884]
Surface Masked AutoEncoder (sMAE) and surface Masked AutoEncoder (MAE)
These models are trained to reconstruct cortical feature maps from masked versions of the input by learning strong latent representations of cortical development and structure function.
Results show that (v)sMAE pre-trained models improve phenotyping prediction performance on multiple tasks by $ge 26%$, and offer faster convergence relative to models trained from scratch.
arXiv Detail & Related papers (2023-08-10T10:01:56Z) - CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion [6.862357145175449]
We propose CoMusion, a single-stage, end-to-end diffusion-based HMP framework.
CoMusion is inspired from the insight that a smooth future pose prediction performance improves spatial prediction performance.
Our method, facilitated by the Transformer-GCN module design and a proposed variance scheduler, predicts accurate, realistic, and consistent motions.
arXiv Detail & Related papers (2023-05-21T19:31:56Z) - PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting [16.033044724498296]
We propose PoseGPT, an auto-regressive transformer-based approach which internally compresses human motion into quantized latent sequences.
Inspired by the Generative Pretrained Transformer (GPT), we propose to train a GPT-like model for next-index prediction in that space.
arXiv Detail & Related papers (2022-10-19T13:30:39Z) - A generic diffusion-based approach for 3D human pose prediction in the
wild [68.00961210467479]
3D human pose forecasting, i.e., predicting a sequence of future human 3D poses given a sequence of past observed ones, is a challenging-temporal task.
We provide a unified formulation in which incomplete elements (no matter in the prediction or observation) are treated as noise and propose a conditional diffusion model that denoises them and forecasts plausible poses.
We investigate our findings on four standard datasets and obtain significant improvements over the state-of-the-art.
arXiv Detail & Related papers (2022-10-11T17:59:54Z) - Back to MLP: A Simple Baseline for Human Motion Prediction [59.18776744541904]
This paper tackles the problem of human motion prediction, consisting in forecasting future body poses from historically observed sequences.
We show that the performance of these approaches can be surpassed by a light-weight and purely architectural architecture with only 0.14M parameters.
An exhaustive evaluation on Human3.6M, AMASS and 3DPW datasets shows that our method, which we dub siMLPe, consistently outperforms all other approaches.
arXiv Detail & Related papers (2022-07-04T16:35:58Z) - PreTR: Spatio-Temporal Non-Autoregressive Trajectory Prediction
Transformer [0.9786690381850356]
We introduce a model called PRediction Transformer (PReTR) that extracts features from the multi-agent scenes by employing a factorized-temporal attention module.
It shows less computational needs than previously studied models with empirically better results.
We leverage encoder-decoder Transformer networks for parallel decoding a set of learned object queries.
arXiv Detail & Related papers (2022-03-17T12:52:23Z) - Investigating Pose Representations and Motion Contexts Modeling for 3D
Motion Prediction [63.62263239934777]
We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task.
We propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction.
Our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency.
arXiv Detail & Related papers (2021-12-30T10:45:22Z) - Aligned Cross Entropy for Non-Autoregressive Machine Translation [120.15069387374717]
We propose aligned cross entropy (AXE) as an alternative loss function for training of non-autoregressive models.
AXE-based training of conditional masked language models (CMLMs) substantially improves performance on major WMT benchmarks.
arXiv Detail & Related papers (2020-04-03T16:24:47Z) - Forecasting Sequential Data using Consistent Koopman Autoencoders [52.209416711500005]
A new class of physics-based methods related to Koopman theory has been introduced, offering an alternative for processing nonlinear dynamical systems.
We propose a novel Consistent Koopman Autoencoder model which, unlike the majority of existing work, leverages the forward and backward dynamics.
Key to our approach is a new analysis which explores the interplay between consistent dynamics and their associated Koopman operators.
arXiv Detail & Related papers (2020-03-04T18:24:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.