Related papers: IM-Animation: An Implicit Motion Representation for Identity-decoupled Character Animation

IM-Animation: An Implicit Motion Representation for Identity-decoupled Character Animation

URL: http://arxiv.org/abs/2602.07498v1
Date: Sat, 07 Feb 2026 11:17:20 GMT
Title: IM-Animation: An Implicit Motion Representation for Identity-decoupled Character Animation
Authors: Zhufeng Xu, Xuan Gao, Feng-Lin Liu, Haoxian Zhang, Zhixue Fang, Yu-Kun Lai, Xiaoqiang Liu, Pengfei Wan, Lin Gao,
Abstract summary: Implicit methods capture motion semantics directly from driving video, but suffer from identity leakage and entanglement between motion and appearance.<n>We propose a novel implicit motion representation that compresses per-frame motion into compact 1D motion tokens.<n>Our methodology employs a three-stage training strategy to enhance the training efficiency and ensure high fidelity.
Score: 58.297199313494
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent progress in video diffusion models has markedly advanced character animation, which synthesizes motioned videos by animating a static identity image according to a driving video. Explicit methods represent motion using skeleton, DWPose or other explicit structured signals, but struggle to handle spatial mismatches and varying body scales. %proportions. Implicit methods, on the other hand, capture high-level implicit motion semantics directly from the driving video, but suffer from identity leakage and entanglement between motion and appearance. To address the above challenges, we propose a novel implicit motion representation that compresses per-frame motion into compact 1D motion tokens. This design relaxes strict spatial constraints inherent in 2D representations and effectively prevents identity information leakage from the motion video. Furthermore, we design a temporally consistent mask token-based retargeting module that enforces a temporal training bottleneck, mitigating interference from the source images' motion and improving retargeting consistency. Our methodology employs a three-stage training strategy to enhance the training efficiency and ensure high fidelity. Extensive experiments demonstrate that our implicit motion representation and the propose IM-Animation's generative capabilities are achieve superior or competitive performance compared with state-of-the-art methods.

Related papers

MotionAdapter: Video Motion Transfer via Content-Aware Attention Customization [73.07309070257162]
MotionAdapter is a content-aware motion transfer framework that enables robust and semantically aligned motion transfer.<n>Our key insight is that effective motion transfer requires explicit disentanglement of motion from appearance.<n> MotionAdapter naturally supports complex motion transfer and motion editing tasks such as zooming.
arXiv Detail & Related papers (2026-01-05T10:01:27Z)
DisMo: Disentangled Motion Representations for Open-World Motion Transfer [21.557843791867906]
DisMo is a novel paradigm for learning abstract motion representations directly from raw video data.<n>Our representation is generic and independent of static information such as appearance, object identity, or pose.<n>We show that the learned representations are well-suited for downstream motion understanding tasks.
arXiv Detail & Related papers (2025-11-28T18:25:54Z)
Consistent and Controllable Image Animation with Motion Linear Diffusion Transformers [23.176184261595747]
We propose MiraMo, a framework designed to enhance efficiency, appearance consistency, and motion smoothness in image animation.<n>Specifically, MiraMo introduces three key elements: (1) A foundational text-to-video architecture replacing vanilla self-attention with efficient linear attention to reduce computational overhead while preserving generation quality; (2) A novel motion residual learning paradigm that focuses on modeling motion dynamics rather than directly predicting frames, improving temporal consistency; and (3) A DCT-based noise refinement strategy during inference to suppress sudden motion artifacts, complemented by a dynamics control module to balance motion smoothness and expressiveness.
arXiv Detail & Related papers (2025-08-10T08:59:32Z)
X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention [52.94097577075215]
X-NeMo is a zero-shot diffusion-based portrait animation pipeline.<n>It animates a static portrait using facial movements from a driving video of a different individual.
arXiv Detail & Related papers (2025-07-30T22:46:52Z)
M2DAO-Talker: Harmonizing Multi-granular Motion Decoupling and Alternating Optimization for Talking-head Generation [65.48046909056468]
We reformulate talking head generation into a unified framework comprising video preprocessing, motion representation, and rendering reconstruction.<n>M2DAO-Talker achieves state-of-the-art performance, with the 2.43 dB PSNR improvement in generation quality and 0.64 gain in user-evaluated video realness.
arXiv Detail & Related papers (2025-07-11T04:48:12Z)
A Self-supervised Motion Representation for Portrait Video Generation [19.56640370303683]
We propose Semantic Latent Motion (SeMo), a compact and expressive motion representation.<n>Our approach achieves both high-quality visual results and efficient inference.<n>Our approach surpasses state-of-the-art models with an 81% win rate in realism.
arXiv Detail & Related papers (2025-03-13T06:43:21Z)
Motion Inversion for Video Customization [31.607669029754874]
We present a novel approach for motion in generation, addressing the widespread gap in the exploration of motion representation within video models. We introduce Motion Embeddings, a set of explicit, temporally coherent embeddings derived from given video. Our contributions include a tailored motion embedding for customization tasks and a demonstration of the practical advantages and effectiveness of our method.
arXiv Detail & Related papers (2024-03-29T14:14:22Z)
Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.<n> SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.<n>Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z)
Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs. SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions. Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z)
AnaMoDiff: 2D Analogical Motion Diffusion via Disentangled Denoising [25.839194626743126]
AnaMoDiff is a novel diffusion-based method for 2D motion analogies. Our goal is to accurately transfer motions from a 2D driving video onto a source character, with its identity, in terms of appearance and natural movement.
arXiv Detail & Related papers (2024-02-05T22:10:54Z)
Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning [16.094271750354835]
Motion information is critical to a robust and generalized video representation. Recent works have adopted frame difference as the source of motion information in video contrastive learning. We present a framework capable of introducing well-aligned and significant motion information.
arXiv Detail & Related papers (2023-09-01T07:03:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.