MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
- URL: http://arxiv.org/abs/2406.01867v2
- Date: Thu, 18 Jul 2024 07:45:54 GMT
- Title: MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
- Authors: Kengo Uchida, Takashi Shibuya, Yuhta Takida, Naoki Murata, Shusuke Takahashi, Yuki Mitsufuji,
- Abstract summary: We propose MoLA, which provides fast and high-quality motion generation and can deal with multiple editing tasks in a single framework.
For high-quality and fast generation, we employ a variational autoencoder and latent diffusion model, and improve the performance with adversarial training.
We quantitatively show the effectiveness of adversarial learning in text-to-motion generation, and demonstrate the applicability of our editing framework to multiple editing tasks in the motion domain.
- Score: 19.619603595952512
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In motion generation, controllability as well as generation quality and speed is becoming more and more important. There are various motion editing tasks, such as in-betweening, upper body editing, and path-following, but existing methods perform motion editing with a data-space diffusion model, which is slow in inference compared to a latent diffusion model. In this paper, we propose MoLA, which provides fast and high-quality motion generation and also can deal with multiple editing tasks in a single framework. For high-quality and fast generation, we employ a variational autoencoder and latent diffusion model, and improve the performance with adversarial training. In addition, we apply a training-free guided generation framework to achieve various editing tasks with motion control inputs. We quantitatively show the effectiveness of adversarial learning in text-to-motion generation, and demonstrate the applicability of our editing framework to multiple editing tasks in the motion domain.
Related papers
- Dynamic Motion Blending for Versatile Motion Editing [43.10279926787476]
We introduce MotionMixCut, an online data augmentation technique that generates training triplets by blending body part motions based on input text.
We present MotionReFit, an auto-regressive diffusion model with a motion coordinator.
Our method handles both spatial and temporal motion edits directly from high-level human instructions.
arXiv Detail & Related papers (2025-03-26T17:07:24Z) - Leader and Follower: Interactive Motion Generation under Trajectory Constraints [42.90788442575116]
This paper explores the motion range refinement process in interactive motion generation.
It proposes a training-free approach, integrating a Pace Controller and a Kinematic Synchronization Adapter.
Experimental results show that the proposed approach, by better leveraging trajectory information, outperforms existing methods in both realism and accuracy.
arXiv Detail & Related papers (2025-02-17T08:52:45Z) - MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent [58.09607975296408]
We propose MotionAgent, enabling fine-grained motion control for text-guided image-to-video generation.
The key technique is the motion field agent that converts motion information in text prompts into explicit motion fields.
We construct a subset of VBench to evaluate the alignment of motion information in the text and the generated video, outperforming other advanced models on motion generation accuracy.
arXiv Detail & Related papers (2025-02-05T14:26:07Z) - MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm [6.920041357348772]
Human motion generation and editing are key components of computer graphics and vision.
We introduce a novel paradigm: Motion-Condition-Motion, which enables the unified formulation of diverse tasks.
Based on this paradigm, we propose a unified framework, MotionLab, which incorporates rectified flows to learn the mapping from source motion to target motion.
arXiv Detail & Related papers (2025-02-04T14:43:26Z) - CigTime: Corrective Instruction Generation Through Inverse Motion Editing [12.947526481961516]
Given a user's current motion (source) and the desired motion (target), we generate text instructions to guide the user towards achieving the target motion.
We leverage large language models to generate corrective texts and utilize existing motion generation and editing frameworks.
Our approach demonstrates its effectiveness in instructional scenarios, offering text-based guidance to correct and enhance user performance.
arXiv Detail & Related papers (2024-12-06T22:57:36Z) - MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding [76.30210465222218]
MotionGPT-2 is a unified Large Motion-Language Model (LMLMLM)
It supports multimodal control conditions through pre-trained Large Language Models (LLMs)
It is highly adaptable to the challenging 3D holistic motion generation task.
arXiv Detail & Related papers (2024-10-29T05:25:34Z) - Infinite Motion: Extended Motion Generation via Long Text Instructions [51.61117351997808]
"Infinite Motion" is a novel approach that leverages long text to extended motion generation.
Key innovation of our model is its ability to accept arbitrary lengths of text as input.
We incorporate the timestamp design for text which allows precise editing of local segments within the generated sequences.
arXiv Detail & Related papers (2024-07-11T12:33:56Z) - Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer [55.109778609058154]
Existing diffusion-based motion editing methods overlook the profound potential of the prior embedded within the weights of pre-trained models.
We uncover the roles and interactions of attention elements in capturing and representing motion patterns.
We integrate these elements to transfer a leader motion to a follower one while maintaining the nuanced characteristics of the follower, resulting in zero-shot motion transfer.
arXiv Detail & Related papers (2024-06-10T17:47:14Z) - MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion [94.66090422753126]
MotionFollower is a lightweight score-guided diffusion model for video motion editing.
It delivers superior motion editing performance and exclusively supports large camera movements and actions.
Compared with MotionEditor, the most advanced motion editing model, MotionFollower achieves an approximately 80% reduction in GPU memory.
arXiv Detail & Related papers (2024-05-30T17:57:30Z) - Video Diffusion Models are Training-free Motion Interpreter and Controller [20.361790608772157]
This paper introduces a novel perspective to understand, localize, and manipulate motion-aware features in video diffusion models.
We present a new MOtion FeaTure (MOFT) by eliminating content correlation information and filtering motion channels.
arXiv Detail & Related papers (2024-05-23T17:59:40Z) - Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.
SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.
Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z) - CoMo: Controllable Motion Generation through Language Guided Pose Code Editing [57.882299081820626]
We introduce CoMo, a Controllable Motion generation model, adept at accurately generating and editing motions.
CoMo decomposes motions into discrete and semantically meaningful pose codes.
It autoregressively generates sequences of pose codes, which are then decoded into 3D motions.
arXiv Detail & Related papers (2024-03-20T18:11:10Z) - Motion Flow Matching for Human Motion Synthesis and Editing [75.13665467944314]
We propose emphMotion Flow Matching, a novel generative model for human motion generation featuring efficient sampling and effectiveness in motion editing applications.
Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks.
arXiv Detail & Related papers (2023-12-14T12:57:35Z) - MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis [73.52948992990191]
MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis.
We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework.
We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
arXiv Detail & Related papers (2022-12-08T18:59:48Z) - FLAME: Free-form Language-based Motion Synthesis & Editing [17.70085940884357]
We propose a diffusion-based motion synthesis and editing model named FLAME.
FLAME can generate high-fidelity motions well aligned with the given text.
It can edit the parts of the motion, both frame-wise and joint-wise, without any fine-tuning.
arXiv Detail & Related papers (2022-09-01T10:34:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.