SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction
- URL: http://arxiv.org/abs/2503.18211v2
- Date: Tue, 25 Mar 2025 20:31:03 GMT
- Title: SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction
- Authors: Zhengyuan Li, Kai Cheng, Anindita Ghosh, Uttaran Bhattacharya, Liangyan Gui, Aniket Bera,
- Abstract summary: We introduce a related task, motion similarity prediction, and propose a multi-task training paradigm.<n>We train the model jointly on motion editing and motion similarity prediction to foster the learning of semantically meaningful representations.<n>Experiments demonstrate the state-of-the-art performance of our approach in both editing alignment and fidelity.
- Score: 20.89960239295474
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-based 3D human motion editing is a critical yet challenging task in computer vision and graphics. While training-free approaches have been explored, the recent release of the MotionFix dataset, which includes source-text-motion triplets, has opened new avenues for training, yielding promising results. However, existing methods struggle with precise control, often leading to misalignment between motion semantics and language instructions. In this paper, we introduce a related task, motion similarity prediction, and propose a multi-task training paradigm, where we train the model jointly on motion editing and motion similarity prediction to foster the learning of semantically meaningful representations. To complement this task, we design an advanced Diffusion-Transformer-based architecture that separately handles motion similarity prediction and motion editing. Extensive experiments demonstrate the state-of-the-art performance of our approach in both editing alignment and fidelity.
Related papers
- Dynamic Motion Blending for Versatile Motion Editing [43.10279926787476]
We introduce MotionMixCut, an online data augmentation technique that generates training triplets by blending body part motions based on input text.
We present MotionReFit, an auto-regressive diffusion model with a motion coordinator.
Our method handles both spatial and temporal motion edits directly from high-level human instructions.
arXiv Detail & Related papers (2025-03-26T17:07:24Z) - Leader and Follower: Interactive Motion Generation under Trajectory Constraints [42.90788442575116]
This paper explores the motion range refinement process in interactive motion generation.
It proposes a training-free approach, integrating a Pace Controller and a Kinematic Synchronization Adapter.
Experimental results show that the proposed approach, by better leveraging trajectory information, outperforms existing methods in both realism and accuracy.
arXiv Detail & Related papers (2025-02-17T08:52:45Z) - MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm [6.920041357348772]
Human motion generation and editing are key components of computer graphics and vision.<n>We introduce a novel paradigm: Motion-Condition-Motion, which enables the unified formulation of diverse tasks.<n>Based on this paradigm, we propose a unified framework, MotionLab, which incorporates rectified flows to learn the mapping from source motion to target motion.
arXiv Detail & Related papers (2025-02-04T14:43:26Z) - MotionFix: Text-Driven 3D Human Motion Editing [52.11745508960547]
Key challenges include the scarcity of training data and the need to design a model that accurately edits the source motion.
We propose a methodology to semi-automatically collect a dataset of triplets comprising (i) a source motion, (ii) a target motion, and (iii) an edit text.
Access to this data allows us to train a conditional diffusion model, TMED, that takes both the source motion and the edit text as input.
arXiv Detail & Related papers (2024-08-01T16:58:50Z) - MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion [94.66090422753126]
MotionFollower is a lightweight score-guided diffusion model for video motion editing.
It delivers superior motion editing performance and exclusively supports large camera movements and actions.
Compared with MotionEditor, the most advanced motion editing model, MotionFollower achieves an approximately 80% reduction in GPU memory.
arXiv Detail & Related papers (2024-05-30T17:57:30Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z) - Motion Flow Matching for Human Motion Synthesis and Editing [75.13665467944314]
We propose emphMotion Flow Matching, a novel generative model for human motion generation featuring efficient sampling and effectiveness in motion editing applications.
Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks.
arXiv Detail & Related papers (2023-12-14T12:57:35Z) - SemanticBoost: Elevating Motion Generation with Augmented Textual Cues [73.83255805408126]
Our framework comprises a Semantic Enhancement module and a Context-Attuned Motion Denoiser (CAMD)
The CAMD approach provides an all-encompassing solution for generating high-quality, semantically consistent motion sequences.
Our experimental results demonstrate that SemanticBoost, as a diffusion-based method, outperforms auto-regressive-based techniques.
arXiv Detail & Related papers (2023-10-31T09:58:11Z) - MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis [73.52948992990191]
MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis.
We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework.
We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
arXiv Detail & Related papers (2022-12-08T18:59:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.