Motion Flow Matching for Human Motion Synthesis and Editing
- URL: http://arxiv.org/abs/2312.08895v1
- Date: Thu, 14 Dec 2023 12:57:35 GMT
- Title: Motion Flow Matching for Human Motion Synthesis and Editing
- Authors: Vincent Tao Hu, Wenzhe Yin, Pingchuan Ma, Yunlu Chen, Basura Fernando,
Yuki M Asano, Efstratios Gavves, Pascal Mettes, Bjorn Ommer, Cees G. M. Snoek
- Abstract summary: We propose emphMotion Flow Matching, a novel generative model for human motion generation featuring efficient sampling and effectiveness in motion editing applications.
Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks.
- Score: 75.13665467944314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human motion synthesis is a fundamental task in computer animation. Recent
methods based on diffusion models or GPT structure demonstrate commendable
performance but exhibit drawbacks in terms of slow sampling speeds and error
accumulation. In this paper, we propose \emph{Motion Flow Matching}, a novel
generative model designed for human motion generation featuring efficient
sampling and effectiveness in motion editing applications. Our method reduces
the sampling complexity from thousand steps in previous diffusion models to
just ten steps, while achieving comparable performance in text-to-motion and
action-to-motion generation benchmarks. Noticeably, our approach establishes a
new state-of-the-art Fr\'echet Inception Distance on the KIT-ML dataset. What
is more, we tailor a straightforward motion editing paradigm named
\emph{sampling trajectory rewriting} leveraging the ODE-style generative models
and apply it to various editing scenarios including motion prediction, motion
in-between prediction, motion interpolation, and upper-body editing. Our code
will be released.
Related papers
- MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion [8.94802080815133]
MoRAG is a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation.
We create diverse samples through the spatial composition of the retrieved motions.
Our framework can serve as a plug-and-play module, improving the performance of motion diffusion models.
arXiv Detail & Related papers (2024-09-18T17:03:30Z) - MotionFix: Text-Driven 3D Human Motion Editing [52.11745508960547]
Given a 3D human motion, our goal is to generate an edited motion as described by the text.
The challenges include the lack of training data and the design of a model that faithfully edits the source motion.
We build a methodology to semi-automatically collect a dataset of triplets in the form of a source motion, (ii) a target motion, and (iii) an edit text, and create the new MotionFix dataset.
arXiv Detail & Related papers (2024-08-01T16:58:50Z) - MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion [94.66090422753126]
MotionFollower is a lightweight score-guided diffusion model for video motion editing.
It delivers superior motion editing performance and exclusively supports large camera movements and actions.
Compared with MotionEditor, the most advanced motion editing model, MotionFollower achieves an approximately 80% reduction in GPU memory.
arXiv Detail & Related papers (2024-05-30T17:57:30Z) - Shape Conditioned Human Motion Generation with Diffusion Model [0.0]
We propose a Shape-conditioned Motion Diffusion model (SMD), which enables the generation of motion sequences directly in mesh format.
We also propose a Spectral-Temporal Autoencoder (STAE) to leverage cross-temporal dependencies within the spectral domain.
arXiv Detail & Related papers (2024-05-10T19:06:41Z) - MotionMix: Weakly-Supervised Diffusion for Controllable Motion
Generation [19.999239668765885]
MotionMix is a weakly-supervised diffusion model that leverages both noisy and unannotated motion sequences.
Our framework consistently achieves state-of-the-art performances on text-to-motion, action-to-motion, and music-to-dance tasks.
arXiv Detail & Related papers (2024-01-20T04:58:06Z) - Synthesizing Long-Term Human Motions with Diffusion Models via Coherent
Sampling [74.62570964142063]
Text-to-motion generation has gained increasing attention, but most existing methods are limited to generating short-term motions.
We propose a novel approach that utilizes a past-conditioned diffusion model with two optional coherent sampling methods.
Our proposed method is capable of generating compositional and coherent long-term 3D human motions controlled by a user-instructed long text stream.
arXiv Detail & Related papers (2023-08-03T16:18:32Z) - MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis [73.52948992990191]
MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis.
We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework.
We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
arXiv Detail & Related papers (2022-12-08T18:59:48Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - FLAME: Free-form Language-based Motion Synthesis & Editing [17.70085940884357]
We propose a diffusion-based motion synthesis and editing model named FLAME.
FLAME can generate high-fidelity motions well aligned with the given text.
It can edit the parts of the motion, both frame-wise and joint-wise, without any fine-tuning.
arXiv Detail & Related papers (2022-09-01T10:34:57Z) - MotionAug: Augmentation with Physical Correction for Human Motion
Prediction [19.240717471864723]
This paper presents a motion data augmentation scheme incorporating motion synthesis encouraging diversity and motion correction imposing physical plausibility.
Our method outperforms previous noise-based motion augmentation methods by a large margin on both Recurrent Neural Network-based and Graph Convolutional Network-based human motion prediction models.
arXiv Detail & Related papers (2022-03-17T06:53:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.