MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis
- URL: http://arxiv.org/abs/2212.04495v2
- Date: Mon, 15 May 2023 11:36:57 GMT
- Title: MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis
- Authors: Rishabh Dabral and Muhammad Hamza Mughal and Vladislav Golyanik and
Christian Theobalt
- Abstract summary: MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis.
We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework.
We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
- Score: 73.52948992990191
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional methods for human motion synthesis are either deterministic or
struggle with the trade-off between motion diversity and motion quality. In
response to these limitations, we introduce MoFusion, i.e., a new
denoising-diffusion-based framework for high-quality conditional human motion
synthesis that can generate long, temporally plausible, and semantically
accurate motions based on a range of conditioning contexts (such as music and
text). We also present ways to introduce well-known kinematic losses for motion
plausibility within the motion diffusion framework through our scheduled
weighting strategy. The learned latent space can be used for several
interactive motion editing applications -- like inbetweening, seed
conditioning, and text-based editing -- thus, providing crucial abilities for
virtual character animation and robotics. Through comprehensive quantitative
evaluations and a perceptual user study, we demonstrate the effectiveness of
MoFusion compared to the state of the art on established benchmarks in the
literature. We urge the reader to watch our supplementary video and visit
https://vcai.mpi-inf.mpg.de/projects/MoFusion.
Related papers
- KinMo: Kinematic-aware Human Motion Understanding and Generation [6.962697597686156]
Controlling human motion based on text presents an important challenge in computer vision.
Traditional approaches often rely on holistic action descriptions for motion synthesis.
We propose a novel motion representation that decomposes motion into distinct body joint group movements.
arXiv Detail & Related papers (2024-11-23T06:50:11Z) - MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion [8.94802080815133]
MoRAG is a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation.
We create diverse samples through the spatial composition of the retrieved motions.
Our framework can serve as a plug-and-play module, improving the performance of motion diffusion models.
arXiv Detail & Related papers (2024-09-18T17:03:30Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z) - AMD:Anatomical Motion Diffusion with Interpretable Motion Decomposition
and Fusion [11.689663297469945]
We propose the Adaptable Motion Diffusion model.
It exploits a Large Language Model (LLM) to parse the input text into a sequence of concise and interpretable anatomical scripts.
We then devise a two-branch fusion scheme that balances the influence of the input text and the anatomical scripts on the inverse diffusion process.
arXiv Detail & Related papers (2023-12-20T04:49:45Z) - Motion Flow Matching for Human Motion Synthesis and Editing [75.13665467944314]
We propose emphMotion Flow Matching, a novel generative model for human motion generation featuring efficient sampling and effectiveness in motion editing applications.
Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks.
arXiv Detail & Related papers (2023-12-14T12:57:35Z) - Motion-Conditioned Diffusion Model for Controllable Video Synthesis [75.367816656045]
We introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes.
We show that MCDiff achieves the state-the-art visual quality in stroke-guided controllable video synthesis.
arXiv Detail & Related papers (2023-04-27T17:59:32Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - FLAME: Free-form Language-based Motion Synthesis & Editing [17.70085940884357]
We propose a diffusion-based motion synthesis and editing model named FLAME.
FLAME can generate high-fidelity motions well aligned with the given text.
It can edit the parts of the motion, both frame-wise and joint-wise, without any fine-tuning.
arXiv Detail & Related papers (2022-09-01T10:34:57Z) - MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework.
It excels at modeling complicated data distribution and generating vivid motion sequences.
It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z) - MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions.
Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset.
We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.