Action-Conditioned 3D Human Motion Synthesis with Transformer VAE
- URL: http://arxiv.org/abs/2104.05670v1
- Date: Mon, 12 Apr 2021 17:40:27 GMT
- Title: Action-Conditioned 3D Human Motion Synthesis with Transformer VAE
- Authors: Mathis Petrovich, Michael J. Black, G\"ul Varol
- Abstract summary: We tackle the problem of action-conditioned generation of realistic and diverse human motion sequences.
In contrast to methods that complete, or extend, motion sequences, this task does not require an initial pose or sequence.
We learn an action-aware latent representation for human motions by training a generative variational autoencoder.
- Score: 44.523477804533364
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We tackle the problem of action-conditioned generation of realistic and
diverse human motion sequences. In contrast to methods that complete, or
extend, motion sequences, this task does not require an initial pose or
sequence. Here we learn an action-aware latent representation for human motions
by training a generative variational autoencoder (VAE). By sampling from this
latent space and querying a certain duration through a series of positional
encodings, we synthesize variable-length motion sequences conditioned on a
categorical action. Specifically, we design a Transformer-based architecture,
ACTOR, for encoding and decoding a sequence of parametric SMPL human body
models estimated from action recognition datasets. We evaluate our approach on
the NTU RGB+D, HumanAct12 and UESTC datasets and show improvements over the
state of the art. Furthermore, we present two use cases: improving action
recognition through adding our synthesized data to training, and motion
denoising. Our code and models will be made available.
Related papers
- Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer [62.29951737214263]
Existing algorithms directly generate the full sequence which is expensive and prone to errors.
We propose KeyMotion, that generates plausible human motion sequences corresponding to input text.
We use a Variationalcoder (VAE) with Kullback-Leibler regularization to project the Autoencoder into a latent space.
For the reverse diffusion, we propose a novel Parallel Skip Transformer that performs cross-modal attention between the design latents and text condition.
arXiv Detail & Related papers (2024-05-24T11:12:37Z) - Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available.
It intricately captures whole-body human motions and part-level object dynamics.
We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - UDE: A Unified Driving Engine for Human Motion Generation [16.32286289924454]
UDE is the first unified driving engine that enables generating human motion sequences from natural language or audio sequences.
We evaluate our method on HumanML3DciteGuo_2022_CVPR and AIST++citeli 2021learn benchmarks.
arXiv Detail & Related papers (2022-11-29T08:30:52Z) - MotionBERT: A Unified Perspective on Learning Human Motion
Representations [46.67364057245364]
We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources.
We propose a pretraining stage in which a motion encoder is trained to recover the underlying 3D motion from noisy partial 2D observations.
We implement motion encoder with a Dual-stream Spatio-temporal Transformer (DSTformer) neural network.
arXiv Detail & Related papers (2022-10-12T19:46:25Z) - Recurrent Transformer Variational Autoencoders for Multi-Action Motion
Synthesis [17.15415641710113]
We consider the problem of synthesizing multi-action human motion sequences of arbitrary lengths.
Existing approaches have mastered motion sequence generation in single-action scenarios, but fail to generalize to multi-action and arbitrary-length sequences.
We propose a novel efficient approach that leverages the richness of Recurrent Transformers and generative richness of conditional Variational Autoencoders.
arXiv Detail & Related papers (2022-06-14T10:40:16Z) - Unsupervised Motion Representation Learning with Capsule Autoencoders [54.81628825371412]
Motion Capsule Autoencoder (MCAE) models motion in a two-level hierarchy.
MCAE is evaluated on a novel Trajectory20 motion dataset and various real-world skeleton-based human action datasets.
arXiv Detail & Related papers (2021-10-01T16:52:03Z) - Conditional Temporal Variational AutoEncoder for Action Video Prediction [66.63038712306606]
ACT-VAE predicts pose sequences for an action clips from a single input image.
When connected with a plug-and-play Pose-to-Image (P2I) network, ACT-VAE can synthesize image sequences.
arXiv Detail & Related papers (2021-08-12T10:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.