SMF: Template-free and Rig-free Animation Transfer using Kinetic Codes
- URL: http://arxiv.org/abs/2504.04831v2
- Date: Fri, 10 Oct 2025 15:19:38 GMT
- Title: SMF: Template-free and Rig-free Animation Transfer using Kinetic Codes
- Authors: Sanjeev Muralikrishnan, Niladri Shekhar Dutt, Niloy J. Mitra,
- Abstract summary: Animation retargetting applies sparse motion description to a character mesh to produce a semantically plausible and temporally coherent full-body sequence.<n>We propose Self-supervised Motion Fields (SMF), a self-supervised framework that is trained with only sparse motion representations.<n>Our architecture comprises dedicated spatial and temporal gradient predictors, which are jointly trained in an end-to-end fashion.
- Score: 32.324844649352166
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Animation retargetting applies sparse motion description (e.g., keypoint sequences) to a character mesh to produce a semantically plausible and temporally coherent full-body mesh sequence. Existing approaches come with restrictions -- they require access to template-based shape priors or artist-designed deformation rigs, suffer from limited generalization to unseen motion and/or shapes, or exhibit motion jitter. We propose Self-supervised Motion Fields (SMF), a self-supervised framework that is trained with only sparse motion representations, without requiring dataset-specific annotations, templates, or rigs. At the heart of our method are Kinetic Codes, a novel autoencoder-based sparse motion encoding, that exposes a semantically rich latent space, simplifying large-scale training. Our architecture comprises dedicated spatial and temporal gradient predictors, which are jointly trained in an end-to-end fashion. The combined network, regularized by the Kinetic Codes' latent space, has good generalization across both unseen shapes and new motions. We evaluated our method on unseen motion sampled from AMASS, D4D, Mixamo, and raw monocular video for animation transfer on various characters with varying shapes and topology. We report a new SoTA on the AMASS dataset in the context of generalization to unseen motion. Code, weights, and supplementary are available on the project webpage at https://motionfields.github.io/
Related papers
- Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising [23.044483059783143]
Diffusion-based video generation can create realistic videos, yet existing image- and text-based conditioning fails to offer precise motion control.<n>We introduce Time-to-Move (TTM), a training-free, plug-and-play framework for motion- and appearance-controlled video generation.
arXiv Detail & Related papers (2025-11-09T22:47:50Z) - SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation [56.90807453045657]
SynMotion is a motion-customized video generation model that jointly leverages semantic guidance and visual adaptation.<n>At the semantic level, we introduce the dual-em semantic comprehension mechanism which disentangles subject and motion representations.<n>At the visual level, we integrate efficient motion adapters into a pre-trained video generation model to enhance motion fidelity and temporal coherence.
arXiv Detail & Related papers (2025-06-30T10:09:32Z) - MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer [36.43680216948212]
We present MotionDreamer, a localized masked modeling paradigm designed to learn internal motion patterns from a given motion.<n> MotionDreamer constructs a robust and informative codebook for local motion patterns using a novel distribution regularization method.<n>As demonstrated through comprehensive experiments, MotionDreamer outperforms the state-of-the-art methods that are typically GAN or Diffusion-based in both faithfulness and diversity.
arXiv Detail & Related papers (2025-04-11T20:27:22Z) - Segment Any Motion in Videos [80.72424676419755]
We propose a novel approach for moving object segmentation that combines long-range trajectory motion cues with DINO-based semantic features.<n>Our model employs Spatio-Temporal Trajectory Attention and Motion-Semantic Decoupled Embedding to prioritize motion while integrating semantic support.
arXiv Detail & Related papers (2025-03-28T09:34:11Z) - AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models [5.224806515926022]
We introduce AnyMoLe, a novel method to generate motion in-between frames for arbitrary characters without external data.
Our approach employs a two-stage frame generation process to enhance contextual understanding.
arXiv Detail & Related papers (2025-03-11T13:28:59Z) - Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics [67.97235923372035]
We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics.
At test time, given a single image and a sparse set of motion trajectories, Puppet-Master can synthesize a video depicting realistic part-level motion faithful to the given drag interactions.
arXiv Detail & Related papers (2024-08-08T17:59:38Z) - Temporal Residual Jacobians For Rig-free Motion Transfer [45.640576754352104]
We introduce Residual Temporal Jacobians as a novel representation to enable data-driven motion transfer.
Our approach does not assume access to any rigging or intermediate shapes, produces geometrically and temporally consistent motions, and can be used to transfer long motion sequences.
arXiv Detail & Related papers (2024-07-20T18:29:22Z) - MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method.
MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model.
During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - MotionBERT: A Unified Perspective on Learning Human Motion
Representations [46.67364057245364]
We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources.
We propose a pretraining stage in which a motion encoder is trained to recover the underlying 3D motion from noisy partial 2D observations.
We implement motion encoder with a Dual-stream Spatio-temporal Transformer (DSTformer) neural network.
arXiv Detail & Related papers (2022-10-12T19:46:25Z) - Masked Motion Encoding for Self-Supervised Video Representation Learning [84.24773072241945]
We present Masked Motion MME, a new pre-training paradigm that reconstructs both appearance and motion information to explore temporal clues.
Motivated by the fact that human is able to recognize an action by tracking objects' position changes and shape changes, we propose to reconstruct a motion trajectory that represents these two kinds of change in the masked regions.
Pre-trained with our MME paradigm, the model is able to anticipate long-term and fine-grained motion details.
arXiv Detail & Related papers (2022-10-12T11:19:55Z) - MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions.
Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset.
We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z) - NeMF: Neural Motion Fields for Kinematic Animation [6.570955948572252]
We express the vast motion space as a continuous function over time, hence the name Neural Motion Fields (NeMF)
We use a neural network to learn this function for miscellaneous sets of motions.
We train our model with diverse human motion dataset and quadruped dataset to prove its versatility.
arXiv Detail & Related papers (2022-06-04T05:53:27Z) - Unsupervised Motion Representation Learning with Capsule Autoencoders [54.81628825371412]
Motion Capsule Autoencoder (MCAE) models motion in a two-level hierarchy.
MCAE is evaluated on a novel Trajectory20 motion dataset and various real-world skeleton-based human action datasets.
arXiv Detail & Related papers (2021-10-01T16:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.