MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm
- URL: http://arxiv.org/abs/2502.02358v3
- Date: Thu, 06 Feb 2025 15:03:54 GMT
- Title: MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm
- Authors: Ziyan Guo, Zeyu Hu, Na Zhao, De Wen Soh,
- Abstract summary: Human motion generation and editing are key components of computer graphics and vision.
We introduce a novel paradigm: Motion-Condition-Motion, which enables the unified formulation of diverse tasks.
Based on this paradigm, we propose a unified framework, MotionLab, which incorporates rectified flows to learn the mapping from source motion to target motion.
- Score: 6.920041357348772
- License:
- Abstract: Human motion generation and editing are key components of computer graphics and vision. However, current approaches in this field tend to offer isolated solutions tailored to specific tasks, which can be inefficient and impractical for real-world applications. While some efforts have aimed to unify motion-related tasks, these methods simply use different modalities as conditions to guide motion generation. Consequently, they lack editing capabilities, fine-grained control, and fail to facilitate knowledge sharing across tasks. To address these limitations and provide a versatile, unified framework capable of handling both human motion generation and editing, we introduce a novel paradigm: Motion-Condition-Motion, which enables the unified formulation of diverse tasks with three concepts: source motion, condition, and target motion. Based on this paradigm, we propose a unified framework, MotionLab, which incorporates rectified flows to learn the mapping from source motion to target motion, guided by the specified conditions. In MotionLab, we introduce the 1) MotionFlow Transformer to enhance conditional generation and editing without task-specific modules; 2) Aligned Rotational Position Encoding} to guarantee the time synchronization between source motion and target motion; 3) Task Specified Instruction Modulation; and 4) Motion Curriculum Learning for effective multi-task learning and knowledge sharing across tasks. Notably, our MotionLab demonstrates promising generalization capabilities and inference efficiency across multiple benchmarks for human motion. Our code and additional video results are available at: https://diouo.github.io/motionlab.github.io/.
Related papers
- Leader and Follower: Interactive Motion Generation under Trajectory Constraints [42.90788442575116]
This paper explores the motion range refinement process in interactive motion generation.
It proposes a training-free approach, integrating a Pace Controller and a Kinematic Synchronization Adapter.
Experimental results show that the proposed approach, by better leveraging trajectory information, outperforms existing methods in both realism and accuracy.
arXiv Detail & Related papers (2025-02-17T08:52:45Z) - Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories.
We translate high-level user requests into detailed, semi-dense motion prompts.
We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z) - MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding [76.30210465222218]
MotionGPT-2 is a unified Large Motion-Language Model (LMLMLM)
It supports multimodal control conditions through pre-trained Large Language Models (LLMs)
It is highly adaptable to the challenging 3D holistic motion generation task.
arXiv Detail & Related papers (2024-10-29T05:25:34Z) - MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls [30.487510829107908]
We propose MotionCraft, a unified diffusion transformer that crafts whole-body motion with plug-and-play multimodal control.
Our framework employs a coarse-to-fine training strategy, starting with the first stage of text-to-motion semantic pre-training.
We introduce MC-Bench, the first available multimodal whole-body motion generation benchmark based on the unified SMPL-X format.
arXiv Detail & Related papers (2024-07-30T18:57:06Z) - MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training [19.550281954226445]
In text-to-motion generation, controllability as well as generation quality and speed has become increasingly critical.
We propose MoLA, which provides fast, high-quality, variable-length motion generation and can also deal with multiple editing tasks in a single framework.
arXiv Detail & Related papers (2024-06-04T00:38:44Z) - Programmable Motion Generation for Open-Set Motion Control Tasks [51.73738359209987]
We introduce a new paradigm, programmable motion generation.
In this paradigm, any given motion control task is broken down into a combination of atomic constraints.
These constraints are then programmed into an error function that quantifies the degree to which a motion sequence adheres to them.
arXiv Detail & Related papers (2024-05-29T17:14:55Z) - FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis [65.85686550683806]
This paper reconsiders motion generation and proposes to unify the single and multi-person motion by the conditional motion distribution.
Based on our framework, the current single-person motion spatial control method could be seamlessly integrated, achieving precise control of multi-person motion.
arXiv Detail & Related papers (2024-05-24T17:57:57Z) - CoMo: Controllable Motion Generation through Language Guided Pose Code Editing [57.882299081820626]
We introduce CoMo, a Controllable Motion generation model, adept at accurately generating and editing motions.
CoMo decomposes motions into discrete and semantically meaningful pose codes.
It autoregressively generates sequences of pose codes, which are then decoded into 3D motions.
arXiv Detail & Related papers (2024-03-20T18:11:10Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.