Priority-Centric Human Motion Generation in Discrete Latent Space
- URL: http://arxiv.org/abs/2308.14480v2
- Date: Wed, 30 Aug 2023 15:33:01 GMT
- Title: Priority-Centric Human Motion Generation in Discrete Latent Space
- Authors: Hanyang Kong, Kehong Gong, Dongze Lian, Michael Bi Mi, Xinchao Wang
- Abstract summary: We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
- Score: 59.401128190423535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-motion generation is a formidable task, aiming to produce human
motions that align with the input text while also adhering to human
capabilities and physical laws. While there have been advancements in diffusion
models, their application in discrete spaces remains underexplored. Current
methods often overlook the varying significance of different motions, treating
them uniformly. It is essential to recognize that not all motions hold the same
relevance to a particular textual description. Some motions, being more salient
and informative, should be given precedence during generation. In response, we
introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM), which
utilizes a Transformer-based VQ-VAE to derive a concise, discrete motion
representation, incorporating a global self-attention mechanism and a
regularization term to counteract code collapse. We also present a motion
discrete diffusion model that employs an innovative noise schedule, determined
by the significance of each motion token within the entire motion sequence.
This approach retains the most salient motions during the reverse diffusion
process, leading to more semantically rich and varied motions. Additionally, we
formulate two strategies to gauge the importance of motion tokens, drawing from
both textual and visual indicators. Comprehensive experiments on the HumanML3D
and KIT-ML datasets confirm that our model surpasses existing techniques in
fidelity and diversity, particularly for intricate textual descriptions.
Related papers
- MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding [76.30210465222218]
MotionGPT-2 is a unified Large Motion-Language Model (LMLMLM)
It supports multimodal control conditions through pre-trained Large Language Models (LLMs)
It is highly adaptable to the challenging 3D holistic motion generation task.
arXiv Detail & Related papers (2024-10-29T05:25:34Z) - Text-driven Human Motion Generation with Motion Masked Diffusion Model [23.637853270123045]
Text human motion generation is a task that synthesizes human motion sequences conditioned on natural language.
Current diffusion model-based approaches have outstanding performance in the diversity and multimodality of generation.
We propose Motion Masked Diffusion Model bftext(MMDM), a novel human motion mechanism for diffusion model.
arXiv Detail & Related papers (2024-09-29T12:26:24Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z) - Seamless Human Motion Composition with Blended Positional Encodings [38.85158088021282]
We introduce FlowMDM, the first diffusion-based model that generates seamless Human Motion Compositions (HMC) without postprocessing or redundant denoising steps.
We achieve state-of-the-art results in terms of accuracy, realism, and smoothness on the Babel and HumanML3D datasets.
arXiv Detail & Related papers (2024-02-23T18:59:40Z) - MotionMix: Weakly-Supervised Diffusion for Controllable Motion
Generation [19.999239668765885]
MotionMix is a weakly-supervised diffusion model that leverages both noisy and unannotated motion sequences.
Our framework consistently achieves state-of-the-art performances on text-to-motion, action-to-motion, and music-to-dance tasks.
arXiv Detail & Related papers (2024-01-20T04:58:06Z) - DiverseMotion: Towards Diverse Human Motion Generation via Discrete
Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions.
We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z) - Human MotionFormer: Transferring Human Motions with Vision Transformers [73.48118882676276]
Human motion transfer aims to transfer motions from a target dynamic person to a source static one for motion synthesis.
We propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching.
Experiments show that our Human MotionFormer sets the new state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-02-22T11:42:44Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - Human Motion Diffusion Model [35.05219668478535]
Motion Diffusion Model (MDM) is a transformer-based generative model for the human motion domain.
We show that our model is trained with lightweight resources and yet achieves state-of-the-art results on leading benchmarks for text-to-motion and action-to-motion.
arXiv Detail & Related papers (2022-09-29T16:27:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.