DiverseMotion: Towards Diverse Human Motion Generation via Discrete
Diffusion
- URL: http://arxiv.org/abs/2309.01372v1
- Date: Mon, 4 Sep 2023 05:43:48 GMT
- Title: DiverseMotion: Towards Diverse Human Motion Generation via Discrete
Diffusion
- Authors: Yunhong Lou, Linchao Zhu, Yaxiong Wang, Xiaohan Wang, Yi Yang
- Abstract summary: We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions.
We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
- Score: 70.33381660741861
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present DiverseMotion, a new approach for synthesizing high-quality human
motions conditioned on textual descriptions while preserving motion
diversity.Despite the recent significant process in text-based human motion
generation,existing methods often prioritize fitting training motions at the
expense of action diversity. Consequently, striking a balance between motion
quality and diversity remains an unresolved challenge. This problem is
compounded by two key factors: 1) the lack of diversity in motion-caption pairs
in existing benchmarks and 2) the unilateral and biased semantic understanding
of the text prompt, focusing primarily on the verb component while neglecting
the nuanced distinctions indicated by other words.In response to the first
issue, we construct a large-scale Wild Motion-Caption dataset (WMC) to extend
the restricted action boundary of existing well-annotated datasets, enabling
the learning of diverse motions through a more extensive range of actions. To
this end, a motion BLIP is trained upon a pretrained vision-language model,
then we automatically generate diverse motion captions for the collected motion
sequences. As a result, we finally build a dataset comprising 8,888 motions
coupled with 141k text.To comprehensively understand the text command, we
propose a Hierarchical Semantic Aggregation (HSA) module to capture the
fine-grained semantics.Finally,we involve the above two designs into an
effective Motion Discrete Diffusion (MDD) framework to strike a balance between
motion quality and diversity. Extensive experiments on HumanML3D and KIT-ML
show that our DiverseMotion achieves the state-of-the-art motion quality and
competitive motion diversity. Dataset, code, and pretrained models will be
released to reproduce all of our results.
Related papers
- MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding [76.30210465222218]
MotionGPT-2 is a unified Large Motion-Language Model (LMLMLM)
It supports multimodal control conditions through pre-trained Large Language Models (LLMs)
It is highly adaptable to the challenging 3D holistic motion generation task.
arXiv Detail & Related papers (2024-10-29T05:25:34Z) - MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls [30.487510829107908]
We propose MotionCraft, a unified diffusion transformer that crafts whole-body motion with plug-and-play multimodal control.
Our framework employs a coarse-to-fine training strategy, starting with the first stage of text-to-motion semantic pre-training.
We introduce MC-Bench, the first available multimodal whole-body motion generation benchmark based on the unified SMPL-X format.
arXiv Detail & Related papers (2024-07-30T18:57:06Z) - Towards Open Domain Text-Driven Synthesis of Multi-Person Motions [36.737740727883924]
We curate human pose and motion datasets by estimating pose information from large-scale image and video datasets.
Our method is the first to generate multi-subject motion sequences with high diversity and fidelity from a large variety of textual prompts.
arXiv Detail & Related papers (2024-05-28T18:00:06Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z) - MotionMix: Weakly-Supervised Diffusion for Controllable Motion
Generation [19.999239668765885]
MotionMix is a weakly-supervised diffusion model that leverages both noisy and unannotated motion sequences.
Our framework consistently achieves state-of-the-art performances on text-to-motion, action-to-motion, and music-to-dance tasks.
arXiv Detail & Related papers (2024-01-20T04:58:06Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - Towards Diverse and Natural Scene-aware 3D Human Motion Synthesis [117.15586710830489]
We focus on the problem of synthesizing diverse scene-aware human motions under the guidance of target action sequences.
Based on this factorized scheme, a hierarchical framework is proposed, with each sub-module responsible for modeling one aspect.
Experiment results show that the proposed framework remarkably outperforms previous methods in terms of diversity and naturalness.
arXiv Detail & Related papers (2022-05-25T18:20:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.