Pose Transformers (POTR): Human Motion Prediction with
Non-Autoregressive Transformers
- URL: http://arxiv.org/abs/2109.07531v1
- Date: Wed, 15 Sep 2021 18:55:15 GMT
- Title: Pose Transformers (POTR): Human Motion Prediction with
Non-Autoregressive Transformers
- Authors: Angel Mart\'inez-Gonz\'alez, Michael Villamizar, Jean-Marc Odobez
- Abstract summary: We propose to leverage Transformer architectures for non-autoregressive human motion prediction.
Our approach decodes elements in parallel from a query sequence, instead of conditioning on previous predictions.
We show that despite its simplicity, our approach achieves competitive results in two public datasets.
- Score: 24.36592204215444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose to leverage Transformer architectures for non-autoregressive human
motion prediction. Our approach decodes elements in parallel from a query
sequence, instead of conditioning on previous predictions such as
instate-of-the-art RNN-based approaches. In such a way our approach is less
computational intensive and potentially avoids error accumulation to long term
elements in the sequence. In that context, our contributions are fourfold: (i)
we frame human motion prediction as a sequence-to-sequence problem and propose
a non-autoregressive Transformer to infer the sequences of poses in parallel;
(ii) we propose to decode sequences of 3D poses from a query sequence generated
in advance with elements from the input sequence;(iii) we propose to perform
skeleton-based activity classification from the encoder memory, in the hope
that identifying the activity can improve predictions;(iv) we show that despite
its simplicity, our approach achieves competitive results in two public
datasets, although surprisingly more for short term predictions rather than for
long term ones.
Related papers
- Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so.
We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed.
Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z) - CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion [6.862357145175449]
We propose CoMusion, a single-stage, end-to-end diffusion-based HMP framework.
CoMusion is inspired from the insight that a smooth future pose prediction performance improves spatial prediction performance.
Our method, facilitated by the Transformer-GCN module design and a proposed variance scheduler, predicts accurate, realistic, and consistent motions.
arXiv Detail & Related papers (2023-05-21T19:31:56Z) - Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge
for Human Motion Prediction [26.25110973770013]
Previous works on human motion prediction follow the pattern of building a mapping relation between the sequence observed and the one to be predicted.
We present a new prediction pattern, which introduces previously overlooked human poses, to implement the prediction task.
These poses exist after the predicted sequence, and form the privileged sequence.
arXiv Detail & Related papers (2022-08-02T08:13:43Z) - Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions.
Existing neural models have been shown to lack this basic ability in learning symbolic structures.
We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z) - Don't Take It Literally: An Edit-Invariant Sequence Loss for Text
Generation [109.46348908829697]
We propose a novel Edit-Invariant Sequence Loss (EISL), which computes the matching loss of a target n-gram with all n-grams in the generated sequence.
We conduct experiments on three tasks: machine translation with noisy target sequences, unsupervised text style transfer, and non-autoregressive machine translation.
arXiv Detail & Related papers (2021-06-29T03:59:21Z) - Augmenting Sequential Recommendation with Pseudo-Prior Items via
Reversely Pre-training Transformer [61.818320703583126]
Sequential Recommendation characterizes the evolving patterns by modeling item sequences chronologically.
Recent developments of transformer inspire the community to design effective sequence encoders.
We introduce a new framework for textbfAugmenting textbfSequential textbfRecommendation with textbfPseudo-prior items(ASReP)
arXiv Detail & Related papers (2021-05-02T18:06:23Z) - Consistent Accelerated Inference via Confident Adaptive Transformers [29.034390810078172]
We develop a novel approach for confidently accelerating inference in the large and expensive multilayer Transformers.
We simultaneously increase computational efficiency, while guaranteeing a specifiable degree of consistency with the original model with high confidence.
We demonstrate the effectiveness of this approach on four classification and regression tasks.
arXiv Detail & Related papers (2021-04-18T10:22:28Z) - Informer: Beyond Efficient Transformer for Long Sequence Time-Series
Forecasting [25.417560221400347]
Long sequence time-series forecasting (LSTF) demands a high prediction capacity.
Recent studies have shown the potential of Transformer to increase the prediction capacity.
We design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics.
arXiv Detail & Related papers (2020-12-14T11:43:09Z) - Multitask Non-Autoregressive Model for Human Motion Prediction [33.98939145212708]
Non-auToregressive Model (NAT) is proposed with a complete non-autoregressive decoding scheme, as well as a context encoder and a positional encoding module.
Our approach is evaluated on Human3.6M and CMU-Mocap benchmarks and outperforms state-of-the-art autoregressive methods.
arXiv Detail & Related papers (2020-07-13T15:00:19Z) - Funnel-Transformer: Filtering out Sequential Redundancy for Efficient
Language Processing [112.2208052057002]
We propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one.
With comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on a wide variety of sequence-level prediction tasks.
arXiv Detail & Related papers (2020-06-05T05:16:23Z) - Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech
Recognition [66.47000813920617]
We propose a spike-triggered non-autoregressive transformer model for end-to-end speech recognition.
The proposed model can accurately predict the length of the target sequence and achieve a competitive performance.
The model even achieves a real-time factor of 0.0056, which exceeds all mainstream speech recognition models.
arXiv Detail & Related papers (2020-05-16T08:27:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.