Guided Motion Diffusion for Controllable Human Motion Synthesis
- URL: http://arxiv.org/abs/2305.12577v3
- Date: Sun, 29 Oct 2023 19:27:38 GMT
- Title: Guided Motion Diffusion for Controllable Human Motion Synthesis
- Authors: Korrawe Karunratanakul, Konpat Preechakul, Supasorn Suwajanakorn, Siyu
Tang
- Abstract summary: We propose Guided Motion Diffusion (GMD), a method that incorporates spatial constraints into the motion generation process.
Specifically, we propose an effective feature projection scheme that manipulates motion representation to enhance the coherency between spatial information and local poses.
Our experiments justify the development of GMD, which achieves a significant improvement over state-of-the-art methods in text-based motion generation.
- Score: 18.660523853430497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Denoising diffusion models have shown great promise in human motion synthesis
conditioned on natural language descriptions. However, integrating spatial
constraints, such as pre-defined motion trajectories and obstacles, remains a
challenge despite being essential for bridging the gap between isolated human
motion and its surrounding environment. To address this issue, we propose
Guided Motion Diffusion (GMD), a method that incorporates spatial constraints
into the motion generation process. Specifically, we propose an effective
feature projection scheme that manipulates motion representation to enhance the
coherency between spatial information and local poses. Together with a new
imputation formulation, the generated motion can reliably conform to spatial
constraints such as global motion trajectories. Furthermore, given sparse
spatial constraints (e.g. sparse keyframes), we introduce a new dense guidance
approach to turn a sparse signal, which is susceptible to being ignored during
the reverse steps, into denser signals to guide the generated motion to the
given constraints. Our extensive experiments justify the development of GMD,
which achieves a significant improvement over state-of-the-art methods in
text-based motion generation while allowing control of the synthesized motions
with spatial constraints.
Related papers
- KinMo: Kinematic-aware Human Motion Understanding and Generation [6.962697597686156]
Controlling human motion based on text presents an important challenge in computer vision.
Traditional approaches often rely on holistic action descriptions for motion synthesis.
We propose a novel motion representation that decomposes motion into distinct body joint group movements.
arXiv Detail & Related papers (2024-11-23T06:50:11Z) - Real-time Diverse Motion In-betweening with Space-time Control [4.910937238451485]
In this work, we present a data-driven framework for generating diverse in-betweening motions for kinematic characters.
We demonstrate that our in-betweening approach can synthesize both locomotion and unstructured motions, enabling rich, versatile, and high-quality animation generation.
arXiv Detail & Related papers (2024-09-30T22:45:53Z) - Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation [52.87672306545577]
Existing motion generation methods primarily focus on the direct synthesis of global motions.
We propose the local action-guided motion diffusion model, which facilitates global motion generation by utilizing local actions as fine-grained control signals.
Our method provides flexibility in seamlessly combining various local actions and continuous guiding weight adjustment.
arXiv Detail & Related papers (2024-07-15T08:35:00Z) - MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model [29.93359157128045]
This work introduces MotionLCM, extending controllable motion generation to a real-time level.
We first propose the motion latent consistency model (MotionLCM) for motion generation, building upon the latent diffusion model.
By adopting one-step (or few-step) inference, we further improve the runtime efficiency of the motion latent diffusion model for motion generation.
arXiv Detail & Related papers (2024-04-30T17:59:47Z) - FLD: Fourier Latent Dynamics for Structured Motion Representation and
Learning [19.491968038335944]
We introduce a self-supervised, structured representation and generation method that extracts spatial-temporal relationships in periodic or quasi-periodic motions.
Our work opens new possibilities for future advancements in general motion representation and learning algorithms.
arXiv Detail & Related papers (2024-02-21T13:59:21Z) - Motion Flow Matching for Human Motion Synthesis and Editing [75.13665467944314]
We propose emphMotion Flow Matching, a novel generative model for human motion generation featuring efficient sampling and effectiveness in motion editing applications.
Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks.
arXiv Detail & Related papers (2023-12-14T12:57:35Z) - DiffusionPhase: Motion Diffusion in Frequency Domain [69.811762407278]
We introduce a learning-based method for generating high-quality human motion sequences from text descriptions.
Existing techniques struggle with motion diversity and smooth transitions in generating arbitrary-length motion sequences.
We develop a network encoder that converts the motion space into a compact yet expressive parameterized phase space.
arXiv Detail & Related papers (2023-12-07T04:39:22Z) - MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis [73.52948992990191]
MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis.
We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework.
We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
arXiv Detail & Related papers (2022-12-08T18:59:48Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions.
Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset.
We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.