Related papers: Semantics-aware Motion Retargeting with Vision-Language Models

Semantics-aware Motion Retargeting with Vision-Language Models

URL: http://arxiv.org/abs/2312.01964v3
Date: Mon, 15 Apr 2024 15:00:49 GMT
Title: Semantics-aware Motion Retargeting with Vision-Language Models
Authors: Haodong Zhang, ZhiKe Chen, Haocheng Xu, Lei Hao, Xiaofei Wu, Songcen Xu, Zhensong Zhang, Yue Wang, Rong Xiong,
Abstract summary: We present a novel Semantics-aware Motion reTargeting (SMT) method with the advantage of vision-language models to extract and maintain meaningful motion semantics. We utilize a differentiable module to render 3D motions and the high-level motion semantics are incorporated into the motion process by feeding the vision-language model and aligning the extracted semantic embeddings. To ensure the preservation of fine-grained motion details and high-level semantics, we adopt two-stage pipeline consisting of skeleton-aware pre-training and fine-tuning with semantics and geometry constraints.
Score: 19.53696208117539
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Capturing and preserving motion semantics is essential to motion retargeting between animation characters. However, most of the previous works neglect the semantic information or rely on human-designed joint-level representations. Here, we present a novel Semantics-aware Motion reTargeting (SMT) method with the advantage of vision-language models to extract and maintain meaningful motion semantics. We utilize a differentiable module to render 3D motions. Then the high-level motion semantics are incorporated into the motion retargeting process by feeding the vision-language model with the rendered images and aligning the extracted semantic embeddings. To ensure the preservation of fine-grained motion details and high-level semantics, we adopt a two-stage pipeline consisting of skeleton-aware pre-training and fine-tuning with semantics and geometry constraints. Experimental results show the effectiveness of the proposed method in producing high-quality motion retargeting results while accurately preserving motion semantics.

Related papers

MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation [23.051430600796277]
MotionShot is a framework for parsing reference-target correspondences in a fine-grained manner.<n>It can coherently transfer motion across objects, even in the presence of significant appearance and structure disparities.
arXiv Detail & Related papers (2025-07-22T07:51:05Z)
SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation [56.90807453045657]
SynMotion is a motion-customized video generation model that jointly leverages semantic guidance and visual adaptation.<n>At the semantic level, we introduce the dual-em semantic comprehension mechanism which disentangles subject and motion representations.<n>At the visual level, we integrate efficient motion adapters into a pre-trained video generation model to enhance motion fidelity and temporal coherence.
arXiv Detail & Related papers (2025-06-30T10:09:32Z)
Segment Any Motion in Videos [80.72424676419755]
We propose a novel approach for moving object segmentation that combines long-range trajectory motion cues with DINO-based semantic features. Our model employs Spatio-Temporal Trajectory Attention and Motion-Semantic Decoupled Embedding to prioritize motion while integrating semantic support.
arXiv Detail & Related papers (2025-03-28T09:34:11Z)
SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction [20.89960239295474]
We introduce a related task, motion similarity prediction, and propose a multi-task training paradigm. We train the model jointly on motion editing and motion similarity prediction to foster the learning of semantically meaningful representations. Experiments demonstrate the state-of-the-art performance of our approach in both editing alignment and fidelity.
arXiv Detail & Related papers (2025-03-23T21:29:37Z)
KinMo: Kinematic-aware Human Motion Understanding and Generation [6.962697597686156]
Controlling human motion based on text presents an important challenge in computer vision. Traditional approaches often rely on holistic action descriptions for motion synthesis. We propose a novel motion representation that decomposes motion into distinct body joint group movements.
arXiv Detail & Related papers (2024-11-23T06:50:11Z)
Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs. SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions. Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z)
Act As You Wish: Fine-Grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs [31.244039305932287]
We propose hierarchical semantic graphs for fine-grained control over motion generation. We disentangle motion descriptions into hierarchical semantic graphs including three levels of motions, actions, and specifics. Our method can continuously refine the generated motion, which may have a far-reaching impact on the community.
arXiv Detail & Related papers (2023-11-02T06:20:23Z)
SemanticBoost: Elevating Motion Generation with Augmented Textual Cues [73.83255805408126]
Our framework comprises a Semantic Enhancement module and a Context-Attuned Motion Denoiser (CAMD) The CAMD approach provides an all-encompassing solution for generating high-quality, semantically consistent motion sequences. Our experimental results demonstrate that SemanticBoost, as a diffusion-based method, outperforms auto-regressive-based techniques.
arXiv Detail & Related papers (2023-10-31T09:58:11Z)
DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions. We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z)
Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation. M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse. We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z)
MoCaNet: Motion Retargeting in-the-wild via Canonicalization Networks [77.56526918859345]
We present a novel framework that brings the 3D motion task from controlled environments to in-the-wild scenarios. It is capable of body motion from a character in a 2D monocular video to a 3D character without using any motion capture system or 3D reconstruction procedure.
arXiv Detail & Related papers (2021-12-19T07:52:05Z)
Hierarchical Contrastive Motion Learning for Video Action Recognition [100.9807616796383]
We present hierarchical contrastive motion learning, a new self-supervised learning framework to extract effective motion representations from raw video frames. Our approach progressively learns a hierarchy of motion features that correspond to different abstraction levels in a network. Our motion learning module is lightweight and flexible to be embedded into various backbone networks.
arXiv Detail & Related papers (2020-07-20T17:59:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.