Related papers: Motion Flow Matching for Human Motion Synthesis and Editing

Motion Flow Matching for Human Motion Synthesis and Editing

URL: http://arxiv.org/abs/2312.08895v1
Date: Thu, 14 Dec 2023 12:57:35 GMT
Title: Motion Flow Matching for Human Motion Synthesis and Editing
Authors: Vincent Tao Hu, Wenzhe Yin, Pingchuan Ma, Yunlu Chen, Basura Fernando, Yuki M Asano, Efstratios Gavves, Pascal Mettes, Bjorn Ommer, Cees G. M. Snoek
Abstract summary: We propose emphMotion Flow Matching, a novel generative model for human motion generation featuring efficient sampling and effectiveness in motion editing applications. Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks.
Score: 75.13665467944314
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human motion synthesis is a fundamental task in computer animation. Recent methods based on diffusion models or GPT structure demonstrate commendable performance but exhibit drawbacks in terms of slow sampling speeds and error accumulation. In this paper, we propose \emph{Motion Flow Matching}, a novel generative model designed for human motion generation featuring efficient sampling and effectiveness in motion editing applications. Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks. Noticeably, our approach establishes a new state-of-the-art Fr\'echet Inception Distance on the KIT-ML dataset. What is more, we tailor a straightforward motion editing paradigm named \emph{sampling trajectory rewriting} leveraging the ODE-style generative models and apply it to various editing scenarios including motion prediction, motion in-between prediction, motion interpolation, and upper-body editing. Our code will be released.

Related papers

SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation [56.90807453045657]
SynMotion is a motion-customized video generation model that jointly leverages semantic guidance and visual adaptation.<n>At the semantic level, we introduce the dual-em semantic comprehension mechanism which disentangles subject and motion representations.<n>At the visual level, we integrate efficient motion adapters into a pre-trained video generation model to enhance motion fidelity and temporal coherence.
arXiv Detail & Related papers (2025-06-30T10:09:32Z)
Absolute Coordinates Make Motion Generation Easy [8.153961351540834]
State-of-the-art text-to-motion generation models rely on the kinematic-aware, local-relative motion representation popularized by HumanML3D.<n>We propose a radically simplified and long-abandoned alternative for text-to-motion generation: absolute joint coordinates in global space.
arXiv Detail & Related papers (2025-05-26T00:36:00Z)
GENMO: A GENeralist Model for Human MOtion [64.16188966024542]
We present GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework.<n>Our key insight is to reformulate motion estimation as constrained motion generation, where the output motion must precisely satisfy observed conditioning signals.<n>Our novel architecture handles variable-length motions and mixed multimodal conditions (text, audio, video) at different time intervals, offering flexible control.
arXiv Detail & Related papers (2025-05-02T17:59:55Z)
EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models [73.96414072072048]
Existing motion transfer methods explored the motion representations of reference videos to guide generation. We propose EfficientMT, a novel and efficient end-to-end framework for video motion transfer. Our experiments demonstrate that our EfficientMT outperforms existing methods in efficiency while maintaining flexible motion controllability.
arXiv Detail & Related papers (2025-03-25T05:51:14Z)
SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction [20.89960239295474]
We introduce a related task, motion similarity prediction, and propose a multi-task training paradigm. We train the model jointly on motion editing and motion similarity prediction to foster the learning of semantically meaningful representations. Experiments demonstrate the state-of-the-art performance of our approach in both editing alignment and fidelity.
arXiv Detail & Related papers (2025-03-23T21:29:37Z)
MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion [8.94802080815133]
MoRAG is a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation. We create diverse samples through the spatial composition of the retrieved motions. Our framework can serve as a plug-and-play module, improving the performance of motion diffusion models.
arXiv Detail & Related papers (2024-09-18T17:03:30Z)
MotionFix: Text-Driven 3D Human Motion Editing [52.11745508960547]
Key challenges include the scarcity of training data and the need to design a model that accurately edits the source motion. We propose a methodology to semi-automatically collect a dataset of triplets comprising (i) a source motion, (ii) a target motion, and (iii) an edit text. Access to this data allows us to train a conditional diffusion model, TMED, that takes both the source motion and the edit text as input.
arXiv Detail & Related papers (2024-08-01T16:58:50Z)
MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion [94.66090422753126]
MotionFollower is a lightweight score-guided diffusion model for video motion editing. It delivers superior motion editing performance and exclusively supports large camera movements and actions. Compared with MotionEditor, the most advanced motion editing model, MotionFollower achieves an approximately 80% reduction in GPU memory.
arXiv Detail & Related papers (2024-05-30T17:57:30Z)
Shape Conditioned Human Motion Generation with Diffusion Model [0.0]
We propose a Shape-conditioned Motion Diffusion model (SMD), which enables the generation of motion sequences directly in mesh format. We also propose a Spectral-Temporal Autoencoder (STAE) to leverage cross-temporal dependencies within the spectral domain.
arXiv Detail & Related papers (2024-05-10T19:06:41Z)
MotionMix: Weakly-Supervised Diffusion for Controllable Motion Generation [19.999239668765885]
MotionMix is a weakly-supervised diffusion model that leverages both noisy and unannotated motion sequences. Our framework consistently achieves state-of-the-art performances on text-to-motion, action-to-motion, and music-to-dance tasks.
arXiv Detail & Related papers (2024-01-20T04:58:06Z)
Synthesizing Long-Term Human Motions with Diffusion Models via Coherent Sampling [74.62570964142063]
Text-to-motion generation has gained increasing attention, but most existing methods are limited to generating short-term motions. We propose a novel approach that utilizes a past-conditioned diffusion model with two optional coherent sampling methods. Our proposed method is capable of generating compositional and coherent long-term 3D human motions controlled by a user-instructed long text stream.
arXiv Detail & Related papers (2023-08-03T16:18:32Z)
MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis [73.52948992990191]
MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis. We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework. We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
arXiv Detail & Related papers (2022-12-08T18:59:48Z)
Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs. Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z)
FLAME: Free-form Language-based Motion Synthesis & Editing [17.70085940884357]
We propose a diffusion-based motion synthesis and editing model named FLAME. FLAME can generate high-fidelity motions well aligned with the given text. It can edit the parts of the motion, both frame-wise and joint-wise, without any fine-tuning.
arXiv Detail & Related papers (2022-09-01T10:34:57Z)
MotionAug: Augmentation with Physical Correction for Human Motion Prediction [19.240717471864723]
This paper presents a motion data augmentation scheme incorporating motion synthesis encouraging diversity and motion correction imposing physical plausibility. Our method outperforms previous noise-based motion augmentation methods by a large margin on both Recurrent Neural Network-based and Graph Convolutional Network-based human motion prediction models.
arXiv Detail & Related papers (2022-03-17T06:53:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.