Synthesizing Long-Term Human Motions with Diffusion Models via Coherent
Sampling
- URL: http://arxiv.org/abs/2308.01850v1
- Date: Thu, 3 Aug 2023 16:18:32 GMT
- Title: Synthesizing Long-Term Human Motions with Diffusion Models via Coherent
Sampling
- Authors: Zhao Yang, Bing Su and Ji-Rong Wen
- Abstract summary: Text-to-motion generation has gained increasing attention, but most existing methods are limited to generating short-term motions.
We propose a novel approach that utilizes a past-conditioned diffusion model with two optional coherent sampling methods.
Our proposed method is capable of generating compositional and coherent long-term 3D human motions controlled by a user-instructed long text stream.
- Score: 74.62570964142063
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-motion generation has gained increasing attention, but most existing
methods are limited to generating short-term motions that correspond to a
single sentence describing a single action. However, when a text stream
describes a sequence of continuous motions, the generated motions corresponding
to each sentence may not be coherently linked. Existing long-term motion
generation methods face two main issues. Firstly, they cannot directly generate
coherent motions and require additional operations such as interpolation to
process the generated actions. Secondly, they generate subsequent actions in an
autoregressive manner without considering the influence of future actions on
previous ones. To address these issues, we propose a novel approach that
utilizes a past-conditioned diffusion model with two optional coherent sampling
methods: Past Inpainting Sampling and Compositional Transition Sampling. Past
Inpainting Sampling completes subsequent motions by treating previous motions
as conditions, while Compositional Transition Sampling models the distribution
of the transition as the composition of two adjacent motions guided by
different text prompts. Our experimental results demonstrate that our proposed
method is capable of generating compositional and coherent long-term 3D human
motions controlled by a user-instructed long text stream. The code is available
at
\href{https://github.com/yangzhao1230/PCMDM}{https://github.com/yangzhao1230/PCMDM}.
Related papers
- M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models [18.125860678409804]
We introduce the Multi-Motion Discrete Diffusion Models (M2D2M), a novel approach for human motion generation from text descriptions.
M2D2M adeptly addresses the challenge of generating multi-motion sequences, ensuring seamless transitions of motions and coherence across a series of actions.
arXiv Detail & Related papers (2024-07-19T17:57:33Z) - Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation [71.08922726494842]
We introduce the problem of timeline control for text-driven motion synthesis.
Instead of a single prompt, users can specify a multi-track timeline of multiple prompts organized in temporal intervals that may overlap.
We propose a new test-time denoising method to generate composite animations from a multi-track timeline.
arXiv Detail & Related papers (2024-01-16T18:39:15Z) - Motion Flow Matching for Human Motion Synthesis and Editing [75.13665467944314]
We propose emphMotion Flow Matching, a novel generative model for human motion generation featuring efficient sampling and effectiveness in motion editing applications.
Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks.
arXiv Detail & Related papers (2023-12-14T12:57:35Z) - DiffusionPhase: Motion Diffusion in Frequency Domain [69.811762407278]
We introduce a learning-based method for generating high-quality human motion sequences from text descriptions.
Existing techniques struggle with motion diversity and smooth transitions in generating arbitrary-length motion sequences.
We develop a network encoder that converts the motion space into a compact yet expressive parameterized phase space.
arXiv Detail & Related papers (2023-12-07T04:39:22Z) - Human Motion Diffusion as a Generative Prior [20.004837564647367]
We introduce three forms of composition based on diffusion priors.
We tackle the challenge of long sequence generation.
Using parallel composition, we show promising steps toward two-person generation.
arXiv Detail & Related papers (2023-03-02T17:09:27Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework.
It excels at modeling complicated data distribution and generating vivid motion sequences.
It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z) - Weakly-supervised Action Transition Learning for Stochastic Human Motion
Prediction [81.94175022575966]
We introduce the task of action-driven human motion prediction.
It aims to predict multiple plausible future motions given a sequence of action labels and a short motion history.
arXiv Detail & Related papers (2022-05-31T08:38:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.