Diffusion Motion: Generate Text-Guided 3D Human Motion by Diffusion
Model
- URL: http://arxiv.org/abs/2210.12315v2
- Date: Fri, 14 Apr 2023 14:39:54 GMT
- Title: Diffusion Motion: Generate Text-Guided 3D Human Motion by Diffusion
Model
- Authors: Zhiyuan Ren, Zhihong Pan, Xin Zhou and Le Kang
- Abstract summary: We propose a simple and novel method for generating 3D human motion from complex natural language sentences.
We use the Denoising Diffusion Probabilistic Model to generate diverse motion results under the guidance of texts.
Our experiments demonstrate that our model competitive results on HumanML3D test set quantitatively and can generate more visually natural and diverse examples.
- Score: 7.381316531478522
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a simple and novel method for generating 3D human motion from
complex natural language sentences, which describe different velocity,
direction and composition of all kinds of actions. Different from existing
methods that use classical generative architecture, we apply the Denoising
Diffusion Probabilistic Model to this task, synthesizing diverse motion results
under the guidance of texts. The diffusion model converts white noise into
structured 3D motion by a Markov process with a series of denoising steps and
is efficiently trained by optimizing a variational lower bound. To achieve the
goal of text-conditioned image synthesis, we use the classifier-free guidance
strategy to fuse text embedding into the model during training. Our experiments
demonstrate that our model achieves competitive results on HumanML3D test set
quantitatively and can generate more visually natural and diverse examples. We
also show with experiments that our model is capable of zero-shot generation of
motions for unseen text guidance.
Related papers
- Motion Flow Matching for Human Motion Synthesis and Editing [75.13665467944314]
We propose emphMotion Flow Matching, a novel generative model for human motion generation featuring efficient sampling and effectiveness in motion editing applications.
Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks.
arXiv Detail & Related papers (2023-12-14T12:57:35Z) - Hierarchical Generation of Human-Object Interactions with Diffusion
Probabilistic Models [71.64318025625833]
This paper presents a novel approach to generating the 3D motion of a human interacting with a target object.
Our framework first generates a set of milestones and then synthesizes the motion along them.
The experiments on the NSM, COUCH, and SAMP datasets show that our approach outperforms previous methods by a large margin in both quality and diversity.
arXiv Detail & Related papers (2023-10-03T17:50:23Z) - Minimally-Supervised Speech Synthesis with Conditional Diffusion Model
and Language Model: A Comparative Study of Semantic Coding [57.42429912884543]
We propose Diff-LM-Speech, Tetra-Diff-Speech and Tri-Diff-Speech to solve high dimensionality and waveform distortion problems.
We also introduce a prompt encoder structure based on a variational autoencoder and a prosody bottleneck to improve prompt representation ability.
Experimental results show that our proposed methods outperform baseline methods.
arXiv Detail & Related papers (2023-07-28T11:20:23Z) - Modiff: Action-Conditioned 3D Motion Generation with Denoising Diffusion
Probabilistic Models [58.357180353368896]
We propose a conditional paradigm that benefits from the denoising diffusion probabilistic model (DDPM) to tackle the problem of realistic and diverse action-conditioned 3D skeleton-based motion generation.
We are a pioneering attempt that uses DDPM to synthesize a variable number of motion sequences conditioned on a categorical action.
arXiv Detail & Related papers (2023-01-10T13:15:42Z) - 3d human motion generation from the text via gesture action
classification and the autoregressive model [28.76063248241159]
The model focuses on generating special gestures that express human thinking, such as waving and nodding.
With several experiments, the proposed method successfully generates perceptually natural and realistic 3D human motion from the text.
arXiv Detail & Related papers (2022-11-18T03:05:49Z) - MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework.
It excels at modeling complicated data distribution and generating vivid motion sequences.
It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z) - TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions.
We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data.
We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z) - Synthesis of Compositional Animations from Textual Descriptions [54.85920052559239]
"How unstructured and complex can we make a sentence and still generate plausible movements from it?"
"How can we animate 3D-characters from a movie script or move robots by simply telling them what we would like them to do?"
arXiv Detail & Related papers (2021-03-26T18:23:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.