Training-Free Motion Customization for Distilled Video Generators with Adaptive Test-Time Distillation
- URL: http://arxiv.org/abs/2506.19348v1
- Date: Tue, 24 Jun 2025 06:20:15 GMT
- Title: Training-Free Motion Customization for Distilled Video Generators with Adaptive Test-Time Distillation
- Authors: Jintao Rong, Xin Xie, Xinyi Yu, Linlin Ou, Xinyu Zhang, Chunhua Shen, Dong Gong,
- Abstract summary: Distilled video generation models offer fast and efficient but struggle with motion customization when guided by reference videos.<n>We propose MotionEcho, a training-free test-time distillation framework that enables motion customization by leveraging diffusion teacher forcing.
- Score: 53.877572078307935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distilled video generation models offer fast and efficient synthesis but struggle with motion customization when guided by reference videos, especially under training-free settings. Existing training-free methods, originally designed for standard diffusion models, fail to generalize due to the accelerated generative process and large denoising steps in distilled models. To address this, we propose MotionEcho, a novel training-free test-time distillation framework that enables motion customization by leveraging diffusion teacher forcing. Our approach uses high-quality, slow teacher models to guide the inference of fast student models through endpoint prediction and interpolation. To maintain efficiency, we dynamically allocate computation across timesteps according to guidance needs. Extensive experiments across various distilled video generation models and benchmark datasets demonstrate that our method significantly improves motion fidelity and generation quality while preserving high efficiency. Project page: https://euminds.github.io/motionecho/
Related papers
- Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion [70.4360995984905]
We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models.<n>It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their own imperfect outputs.
arXiv Detail & Related papers (2025-06-09T17:59:55Z) - AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset [55.82208863521353]
We propose AccVideo to reduce the inference steps for accelerating video diffusion models with synthetic dataset.<n>Our model achieves 8.5x improvements in generation speed compared to the teacher model.<n>Compared to previous accelerating methods, our approach is capable of generating videos with higher quality and resolution.
arXiv Detail & Related papers (2025-03-25T08:52:07Z) - Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss [35.69606926024434]
We propose a simple yet effective solution that combines an initial-noise-based approach with a novel motion consistency loss.<n>We then design a motion consistency loss to maintain similar feature correlation patterns in the generated video.<n>This approach improves temporal consistency across various motion control tasks while preserving the benefits of a training-free setup.
arXiv Detail & Related papers (2025-01-13T18:53:08Z) - DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization [50.30051934609654]
We introduce a distillation method that combines variational score distillation and consistency distillation to achieve few-step video generation.<n>Our method demonstrates state-of-the-art performance in few-step generation for 10-second videos (128 frames at 12 FPS)<n>One-step distillation accelerates the teacher model's diffusion sampling by up to 278.6 times, enabling near real-time generation.
arXiv Detail & Related papers (2024-12-20T09:07:36Z) - From Slow Bidirectional to Fast Autoregressive Video Diffusion Models [52.32078428442281]
Current video diffusion models achieve impressive generation quality but struggle in interactive applications due to bidirectional attention dependencies.<n>We address this limitation by adapting a pretrained bidirectional diffusion transformer to an autoregressive transformer that generates frames on-the-fly.<n>Our model achieves a total score of 84.27 on the VBench-Long benchmark, surpassing all previous video generation models.
arXiv Detail & Related papers (2024-12-10T18:59:50Z) - AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data [45.20627288830823]
It reduces the necessary generation time of similarly sized video diffusion models from 25 seconds to around 1 second.
The method's effectiveness lies in its dual-level decoupling learning approach.
arXiv Detail & Related papers (2024-02-01T16:58:11Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.