Recurrent Transformer Variational Autoencoders for Multi-Action Motion
Synthesis
- URL: http://arxiv.org/abs/2206.06741v1
- Date: Tue, 14 Jun 2022 10:40:16 GMT
- Title: Recurrent Transformer Variational Autoencoders for Multi-Action Motion
Synthesis
- Authors: Rania Briq, Chuhang Zou, Leonid Pishchulin, Chris Broaddus, Juergen
Gall
- Abstract summary: We consider the problem of synthesizing multi-action human motion sequences of arbitrary lengths.
Existing approaches have mastered motion sequence generation in single-action scenarios, but fail to generalize to multi-action and arbitrary-length sequences.
We propose a novel efficient approach that leverages the richness of Recurrent Transformers and generative richness of conditional Variational Autoencoders.
- Score: 17.15415641710113
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of synthesizing multi-action human motion sequences
of arbitrary lengths. Existing approaches have mastered motion sequence
generation in single-action scenarios, but fail to generalize to multi-action
and arbitrary-length sequences. We fill this gap by proposing a novel efficient
approach that leverages the expressiveness of Recurrent Transformers and
generative richness of conditional Variational Autoencoders. The proposed
iterative approach is able to generate smooth and realistic human motion
sequences with an arbitrary number of actions and frames while doing so in
linear space and time. We train and evaluate the proposed approach on PROX
dataset which we augment with ground-truth action labels. Experimental
evaluation shows significant improvements in FID score and semantic consistency
metrics compared to the state-of-the-art.
Related papers
- Human Motion Synthesis_ A Diffusion Approach for Motion Stitching and In-Betweening [2.5165775267615205]
We propose a diffusion model with a transformer-based denoiser to generate realistic human motion.
Our method demonstrated strong performance in generating in-betweening sequences.
We present the performance evaluation of our method using quantitative metrics such as Frechet Inception Distance (FID), Diversity, and Multimodality.
arXiv Detail & Related papers (2024-09-10T18:02:32Z) - Dynamic Motion Synthesis: Masked Audio-Text Conditioned Spatio-Temporal Transformers [13.665279127648658]
This research presents a novel motion generation framework designed to produce whole-body motion sequences conditioned on multiple modalities simultaneously.
By integrating spatial attention mechanisms and a token critic we ensure consistency and naturalness in the generated motions.
arXiv Detail & Related papers (2024-09-03T04:19:27Z) - Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer [62.29951737214263]
Existing algorithms directly generate the full sequence which is expensive and prone to errors.
We propose KeyMotion, that generates plausible human motion sequences corresponding to input text.
We use a Variationalcoder (VAE) with Kullback-Leibler regularization to project the Autoencoder into a latent space.
For the reverse diffusion, we propose a novel Parallel Skip Transformer that performs cross-modal attention between the design latents and text condition.
arXiv Detail & Related papers (2024-05-24T11:12:37Z) - DiffusionPhase: Motion Diffusion in Frequency Domain [69.811762407278]
We introduce a learning-based method for generating high-quality human motion sequences from text descriptions.
Existing techniques struggle with motion diversity and smooth transitions in generating arbitrary-length motion sequences.
We develop a network encoder that converts the motion space into a compact yet expressive parameterized phase space.
arXiv Detail & Related papers (2023-12-07T04:39:22Z) - Ring Attention with Blockwise Transformers for Near-Infinite Context [88.61687950039662]
We present a novel approach, Ring Attention with Blockwise Transformers (Ring Attention), which leverages blockwise computation of self-attention and feedforward to distribute long sequences across multiple devices.
Our approach enables training and inference of sequences that are up to device count times longer than those achievable by prior memory-efficient Transformers.
arXiv Detail & Related papers (2023-10-03T08:44:50Z) - Synthesizing Long-Term Human Motions with Diffusion Models via Coherent
Sampling [74.62570964142063]
Text-to-motion generation has gained increasing attention, but most existing methods are limited to generating short-term motions.
We propose a novel approach that utilizes a past-conditioned diffusion model with two optional coherent sampling methods.
Our proposed method is capable of generating compositional and coherent long-term 3D human motions controlled by a user-instructed long text stream.
arXiv Detail & Related papers (2023-08-03T16:18:32Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - Weakly-supervised Action Transition Learning for Stochastic Human Motion
Prediction [81.94175022575966]
We introduce the task of action-driven human motion prediction.
It aims to predict multiple plausible future motions given a sequence of action labels and a short motion history.
arXiv Detail & Related papers (2022-05-31T08:38:07Z) - Implicit Neural Representations for Variable Length Human Motion
Generation [11.028791809955276]
We propose an action-conditional human motion generation method using variational implicit neural representations (INR)
Our method offers variable-length sequence generation by construction because a part of INR is optimized for a whole sequence of arbitrary length with temporal embeddings.
We show that variable-length motions generated by our method are better than fixed-length motions generated by the state-of-the-art method in terms of realism and diversity.
arXiv Detail & Related papers (2022-03-25T15:00:38Z) - Action-Conditioned 3D Human Motion Synthesis with Transformer VAE [44.523477804533364]
We tackle the problem of action-conditioned generation of realistic and diverse human motion sequences.
In contrast to methods that complete, or extend, motion sequences, this task does not require an initial pose or sequence.
We learn an action-aware latent representation for human motions by training a generative variational autoencoder.
arXiv Detail & Related papers (2021-04-12T17:40:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.