MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
- URL: http://arxiv.org/abs/2208.15001v1
- Date: Wed, 31 Aug 2022 17:58:54 GMT
- Title: MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
- Authors: Mingyuan Zhang, Zhongang Cai, Liang Pan, Fangzhou Hong, Xinying Guo,
Lei Yang, Ziwei Liu
- Abstract summary: MotionDiffuse is a diffusion model-based text-driven motion generation framework.
It excels at modeling complicated data distribution and generating vivid motion sequences.
It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
- Score: 35.32967411186489
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human motion modeling is important for many modern graphics applications,
which typically require professional skills. In order to remove the skill
barriers for laymen, recent motion generation methods can directly generate
human motions conditioned on natural languages. However, it remains challenging
to achieve diverse and fine-grained motion generation with various text inputs.
To address this problem, we propose MotionDiffuse, the first diffusion
model-based text-driven motion generation framework, which demonstrates several
desired properties over existing methods. 1) Probabilistic Mapping. Instead of
a deterministic language-motion mapping, MotionDiffuse generates motions
through a series of denoising steps in which variations are injected. 2)
Realistic Synthesis. MotionDiffuse excels at modeling complicated data
distribution and generating vivid motion sequences. 3) Multi-Level
Manipulation. MotionDiffuse responds to fine-grained instructions on body
parts, and arbitrary-length motion synthesis with time-varied text prompts. Our
experiments show MotionDiffuse outperforms existing SoTA methods by convincing
margins on text-driven motion generation and action-conditioned motion
generation. A qualitative analysis further demonstrates MotionDiffuse's
controllability for comprehensive motion generation. Homepage:
https://mingyuan-zhang.github.io/projects/MotionDiffuse.html
Related papers
- DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control [12.465927271402442]
Text-conditioned human motion generation allows for user interaction through natural language.
DART is a Diffusion-based Autoregressive motion primitive model for Real-time Text-driven motion control.
We present effective algorithms for both approaches, demonstrating our model's versatility and superior performance in various motion synthesis tasks.
arXiv Detail & Related papers (2024-10-07T17:58:22Z) - Infinite Motion: Extended Motion Generation via Long Text Instructions [51.61117351997808]
"Infinite Motion" is a novel approach that leverages long text to extended motion generation.
Key innovation of our model is its ability to accept arbitrary lengths of text as input.
We incorporate the timestamp design for text which allows precise editing of local segments within the generated sequences.
arXiv Detail & Related papers (2024-07-11T12:33:56Z) - FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models [19.09048969615117]
We explore open-set human motion synthesis using natural language instructions as user control signals based on MLLMs.
Our method can achieve general human motion synthesis for many downstream tasks.
arXiv Detail & Related papers (2024-06-15T21:10:37Z) - CoMo: Controllable Motion Generation through Language Guided Pose Code Editing [57.882299081820626]
We introduce CoMo, a Controllable Motion generation model, adept at accurately generating and editing motions.
CoMo decomposes motions into discrete and semantically meaningful pose codes.
It autoregressively generates sequences of pose codes, which are then decoded into 3D motions.
arXiv Detail & Related papers (2024-03-20T18:11:10Z) - Motion Generation from Fine-grained Textual Descriptions [29.033358642532722]
We build a large-scale language-motion dataset specializing in fine-grained textual descriptions, FineHumanML3D.
We design a new text2motion model, FineMotionDiffuse, making full use of fine-grained textual information.
Our evaluation shows that FineMotionDiffuse trained on FineHumanML3D improves FID by a large margin of 0.38, compared with competitive baselines.
arXiv Detail & Related papers (2024-03-20T11:38:30Z) - MotionMix: Weakly-Supervised Diffusion for Controllable Motion
Generation [19.999239668765885]
MotionMix is a weakly-supervised diffusion model that leverages both noisy and unannotated motion sequences.
Our framework consistently achieves state-of-the-art performances on text-to-motion, action-to-motion, and music-to-dance tasks.
arXiv Detail & Related papers (2024-01-20T04:58:06Z) - BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics [50.88842027976421]
We propose BOTH57M, a novel multi-modal dataset for two-hand motion generation.
Our dataset includes accurate motion tracking for the human body and hands.
We also provide a strong baseline method, BOTH2Hands, for the novel task.
arXiv Detail & Related papers (2023-12-13T07:30:19Z) - MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method.
MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model.
During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z) - Synthesizing Long-Term Human Motions with Diffusion Models via Coherent
Sampling [74.62570964142063]
Text-to-motion generation has gained increasing attention, but most existing methods are limited to generating short-term motions.
We propose a novel approach that utilizes a past-conditioned diffusion model with two optional coherent sampling methods.
Our proposed method is capable of generating compositional and coherent long-term 3D human motions controlled by a user-instructed long text stream.
arXiv Detail & Related papers (2023-08-03T16:18:32Z) - MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis [73.52948992990191]
MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis.
We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework.
We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
arXiv Detail & Related papers (2022-12-08T18:59:48Z) - FLAME: Free-form Language-based Motion Synthesis & Editing [17.70085940884357]
We propose a diffusion-based motion synthesis and editing model named FLAME.
FLAME can generate high-fidelity motions well aligned with the given text.
It can edit the parts of the motion, both frame-wise and joint-wise, without any fine-tuning.
arXiv Detail & Related papers (2022-09-01T10:34:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.