Implicit Neural Representations for Variable Length Human Motion
Generation
- URL: http://arxiv.org/abs/2203.13694v1
- Date: Fri, 25 Mar 2022 15:00:38 GMT
- Title: Implicit Neural Representations for Variable Length Human Motion
Generation
- Authors: Pablo Cervantes and Yusuke Sekikawa and Ikuro Sato and Koichi Shinoda
- Abstract summary: We propose an action-conditional human motion generation method using variational implicit neural representations (INR)
Our method offers variable-length sequence generation by construction because a part of INR is optimized for a whole sequence of arbitrary length with temporal embeddings.
We show that variable-length motions generated by our method are better than fixed-length motions generated by the state-of-the-art method in terms of realism and diversity.
- Score: 11.028791809955276
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an action-conditional human motion generation method using
variational implicit neural representations (INR). The variational formalism
enables action-conditional distributions of INRs, from which one can easily
sample representations to generate novel human motion sequences. Our method
offers variable-length sequence generation by construction because a part of
INR is optimized for a whole sequence of arbitrary length with temporal
embeddings. In contrast, previous works reported difficulties with modeling
variable-length sequences. We confirm that our method with a Transformer
decoder outperforms all relevant methods on HumanAct12, NTU-RGBD, and UESTC
datasets in terms of realism and diversity of generated motions. Surprisingly,
even our method with an MLP decoder consistently outperforms the
state-of-the-art Transformer-based auto-encoder. In particular, we show that
variable-length motions generated by our method are better than fixed-length
motions generated by the state-of-the-art method in terms of realism and
diversity.
Related papers
- Human Motion Synthesis_ A Diffusion Approach for Motion Stitching and In-Betweening [2.5165775267615205]
We propose a diffusion model with a transformer-based denoiser to generate realistic human motion.
Our method demonstrated strong performance in generating in-betweening sequences.
We present the performance evaluation of our method using quantitative metrics such as Frechet Inception Distance (FID), Diversity, and Multimodality.
arXiv Detail & Related papers (2024-09-10T18:02:32Z) - Motion Flow Matching for Human Motion Synthesis and Editing [75.13665467944314]
We propose emphMotion Flow Matching, a novel generative model for human motion generation featuring efficient sampling and effectiveness in motion editing applications.
Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks.
arXiv Detail & Related papers (2023-12-14T12:57:35Z) - DiffusionPhase: Motion Diffusion in Frequency Domain [69.811762407278]
We introduce a learning-based method for generating high-quality human motion sequences from text descriptions.
Existing techniques struggle with motion diversity and smooth transitions in generating arbitrary-length motion sequences.
We develop a network encoder that converts the motion space into a compact yet expressive parameterized phase space.
arXiv Detail & Related papers (2023-12-07T04:39:22Z) - SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers [50.90457644954857]
In this work, we apply diffusion models to approach sequence-to-sequence text generation.
We propose SeqDiffuSeq, a text diffusion model for sequence-to-sequence generation.
Experiment results illustrate the good performance on sequence-to-sequence generation in terms of text quality and inference time.
arXiv Detail & Related papers (2022-12-20T15:16:24Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in
Transformer-Based Variational AutoEncoder for Diverse Text Generation [85.5379146125199]
Variational Auto-Encoder (VAE) has been widely adopted in text generation.
We propose TRACE, a Transformer-based recurrent VAE structure.
arXiv Detail & Related papers (2022-10-22T10:25:35Z) - Recurrent Transformer Variational Autoencoders for Multi-Action Motion
Synthesis [17.15415641710113]
We consider the problem of synthesizing multi-action human motion sequences of arbitrary lengths.
Existing approaches have mastered motion sequence generation in single-action scenarios, but fail to generalize to multi-action and arbitrary-length sequences.
We propose a novel efficient approach that leverages the richness of Recurrent Transformers and generative richness of conditional Variational Autoencoders.
arXiv Detail & Related papers (2022-06-14T10:40:16Z) - Diversity-Promoting Human Motion Interpolation via Conditional
Variational Auto-Encoder [6.977809893768435]
We present a deep generative model based method to generate diverse human motion results.
We resort to the Conditional Variational Auto-Encoder (CVAE) to learn human motion conditioned on a pair of given start and end motions.
arXiv Detail & Related papers (2021-11-12T15:16:48Z) - Action-Conditioned 3D Human Motion Synthesis with Transformer VAE [44.523477804533364]
We tackle the problem of action-conditioned generation of realistic and diverse human motion sequences.
In contrast to methods that complete, or extend, motion sequences, this task does not require an initial pose or sequence.
We learn an action-aware latent representation for human motions by training a generative variational autoencoder.
arXiv Detail & Related papers (2021-04-12T17:40:27Z) - Hierarchical Style-based Networks for Motion Synthesis [150.226137503563]
We propose a self-supervised method for generating long-range, diverse and plausible behaviors to achieve a specific goal location.
Our proposed method learns to model the motion of human by decomposing a long-range generation task in a hierarchical manner.
On large-scale skeleton dataset, we show that the proposed method is able to synthesise long-range, diverse and plausible motion.
arXiv Detail & Related papers (2020-08-24T02:11:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.