PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting
- URL: http://arxiv.org/abs/2210.10542v1
- Date: Wed, 19 Oct 2022 13:30:39 GMT
- Title: PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting
- Authors: Thomas Lucas, Fabien Baradel, Philippe Weinzaepfel, Gr\'egory Rogez
- Abstract summary: We propose PoseGPT, an auto-regressive transformer-based approach which internally compresses human motion into quantized latent sequences.
Inspired by the Generative Pretrained Transformer (GPT), we propose to train a GPT-like model for next-index prediction in that space.
- Score: 16.033044724498296
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of action-conditioned generation of human motion
sequences. Existing work falls into two categories: forecast models conditioned
on observed past motions, or generative models conditioned on action labels and
duration only. In contrast, we generate motion conditioned on observations of
arbitrary length, including none. To solve this generalized problem, we propose
PoseGPT, an auto-regressive transformer-based approach which internally
compresses human motion into quantized latent sequences. An auto-encoder first
maps human motion to latent index sequences in a discrete space, and
vice-versa. Inspired by the Generative Pretrained Transformer (GPT), we propose
to train a GPT-like model for next-index prediction in that space; this allows
PoseGPT to output distributions on possible futures, with or without
conditioning on past motion. The discrete and compressed nature of the latent
space allows the GPT-like model to focus on long-range signal, as it removes
low-level redundancy in the input signal. Predicting discrete indices also
alleviates the common pitfall of predicting averaged poses, a typical failure
case when regressing continuous values, as the average of discrete targets is
not a target itself. Our experimental results show that our proposed approach
achieves state-of-the-art results on HumanAct12, a standard but small scale
dataset, as well as on BABEL, a recent large scale MoCap dataset, and on GRAB,
a human-object interactions dataset.
Related papers
- Multiscale Residual Learning of Graph Convolutional Sequence Chunks for
Human Motion Prediction [23.212848643552395]
A new method is proposed for human motion prediction by learning temporal and spatial dependencies.
Our proposed method is able to effectively model the sequence information for motion prediction and outperform other techniques to set a new state-of-the-art.
arXiv Detail & Related papers (2023-08-31T15:23:33Z) - TransFusion: A Practical and Effective Transformer-based Diffusion Model
for 3D Human Motion Prediction [1.8923948104852863]
We propose TransFusion, an innovative and practical diffusion-based model for 3D human motion prediction.
Our model leverages Transformer as the backbone with long skip connections between shallow and deep layers.
In contrast to prior diffusion-based models that utilize extra modules like cross-attention and adaptive layer normalization, we treat all inputs, including conditions, as tokens to create a more lightweight model.
arXiv Detail & Related papers (2023-07-30T01:52:07Z) - CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion [6.862357145175449]
We propose CoMusion, a single-stage, end-to-end diffusion-based HMP framework.
CoMusion is inspired from the insight that a smooth future pose prediction performance improves spatial prediction performance.
Our method, facilitated by the Transformer-GCN module design and a proposed variance scheduler, predicts accurate, realistic, and consistent motions.
arXiv Detail & Related papers (2023-05-21T19:31:56Z) - A generic diffusion-based approach for 3D human pose prediction in the
wild [68.00961210467479]
3D human pose forecasting, i.e., predicting a sequence of future human 3D poses given a sequence of past observed ones, is a challenging-temporal task.
We provide a unified formulation in which incomplete elements (no matter in the prediction or observation) are treated as noise and propose a conditional diffusion model that denoises them and forecasts plausible poses.
We investigate our findings on four standard datasets and obtain significant improvements over the state-of-the-art.
arXiv Detail & Related papers (2022-10-11T17:59:54Z) - HiT-DVAE: Human Motion Generation via Hierarchical Transformer Dynamical
VAE [37.23381308240617]
We propose Hierarchical Transformer Dynamical Variational Autoencoder, HiT-DVAE, which implements auto-regressive generation with transformer-like attention mechanisms.
We evaluate the proposed method on HumanEva-I and Human3.6M with various evaluation methods, and outperform the state-of-the-art methods on most of the metrics.
arXiv Detail & Related papers (2022-04-04T15:12:34Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Generating Smooth Pose Sequences for Diverse Human Motion Prediction [90.45823619796674]
We introduce a unified deep generative network for both diverse and controllable motion prediction.
Our experiments on two standard benchmark datasets, Human3.6M and HumanEva-I, demonstrate that our approach outperforms the state-of-the-art baselines in terms of both sample diversity and accuracy.
arXiv Detail & Related papers (2021-08-19T00:58:00Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - SGCN:Sparse Graph Convolution Network for Pedestrian Trajectory
Prediction [64.16212996247943]
We present a Sparse Graph Convolution Network(SGCN) for pedestrian trajectory prediction.
Specifically, the SGCN explicitly models the sparse directed interaction with a sparse directed spatial graph to capture adaptive interaction pedestrians.
visualizations indicate that our method can capture adaptive interactions between pedestrians and their effective motion tendencies.
arXiv Detail & Related papers (2021-04-04T03:17:42Z) - Motion Prediction Using Temporal Inception Module [96.76721173517895]
We propose a Temporal Inception Module (TIM) to encode human motion.
Our framework produces input embeddings using convolutional layers, by using different kernel sizes for different input lengths.
The experimental results on standard motion prediction benchmark datasets Human3.6M and CMU motion capture dataset show that our approach consistently outperforms the state of the art methods.
arXiv Detail & Related papers (2020-10-06T20:26:01Z) - Multitask Non-Autoregressive Model for Human Motion Prediction [33.98939145212708]
Non-auToregressive Model (NAT) is proposed with a complete non-autoregressive decoding scheme, as well as a context encoder and a positional encoding module.
Our approach is evaluated on Human3.6M and CMU-Mocap benchmarks and outperforms state-of-the-art autoregressive methods.
arXiv Detail & Related papers (2020-07-13T15:00:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.