DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization
- URL: http://arxiv.org/abs/2412.15689v1
- Date: Fri, 20 Dec 2024 09:07:36 GMT
- Title: DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization
- Authors: Zihan Ding, Chi Jin, Difan Liu, Haitian Zheng, Krishna Kumar Singh, Qiang Zhang, Yan Kang, Zhe Lin, Yuchen Liu,
- Abstract summary: We introduce a distillation method that combines variational score distillation and consistency distillation to achieve few-step video generation.
Our method demonstrates state-of-the-art performance in few-step generation for 10-second videos (128 frames at 12 FPS)
One-step distillation accelerates the teacher model's diffusion sampling by up to 278.6 times, enabling near real-time generation.
- Score: 50.30051934609654
- License:
- Abstract: Diffusion probabilistic models have shown significant progress in video generation; however, their computational efficiency is limited by the large number of sampling steps required. Reducing sampling steps often compromises video quality or generation diversity. In this work, we introduce a distillation method that combines variational score distillation and consistency distillation to achieve few-step video generation, maintaining both high quality and diversity. We also propose a latent reward model fine-tuning approach to further enhance video generation performance according to any specified reward metric. This approach reduces memory usage and does not require the reward to be differentiable. Our method demonstrates state-of-the-art performance in few-step generation for 10-second videos (128 frames at 12 FPS). The distilled student model achieves a score of 82.57 on VBench, surpassing the teacher model as well as baseline models Gen-3, T2V-Turbo, and Kling. One-step distillation accelerates the teacher model's diffusion sampling by up to 278.6 times, enabling near real-time generation. Human evaluations further validate the superior performance of our 4-step student models compared to teacher model using 50-step DDIM sampling.
Related papers
- Diffusion Adversarial Post-Training for One-Step Video Generation [26.14991703029242]
We propose Adversarial Post-Training (APT) against real data following diffusion pre-training for one-step video generation.
Our model is capable of generating 1024px images in a single step, achieving quality comparable to state-of-the-art methods.
arXiv Detail & Related papers (2025-01-14T18:51:48Z) - From Slow Bidirectional to Fast Autoregressive Video Diffusion Models [52.32078428442281]
Current video diffusion models achieve impressive generation quality but struggle in interactive applications due to bidirectional attention dependencies.
We address this limitation by adapting a pretrained bidirectional diffusion transformer to an autoregressive transformer that generates frames on-the-fly.
Our model achieves a total score of 84.27 on the VBench-Long benchmark, surpassing all previous video generation models.
arXiv Detail & Related papers (2024-12-10T18:59:50Z) - Accelerating Video Diffusion Models via Distribution Matching [26.475459912686986]
This work introduces a novel framework for diffusion distillation and distribution matching.
Our approach focuses on distilling pre-trained diffusion models into a more efficient few-step generator.
By leveraging a combination of video GAN loss and a novel 2D score distribution matching loss, we demonstrate the potential to generate high-quality video frames.
arXiv Detail & Related papers (2024-12-08T11:36:32Z) - OSV: One Step is Enough for High-Quality Image to Video Generation [29.77646091911169]
We introduce a two-stage training framework that effectively combines consistency distillation and GAN training.
We also propose a novel video discriminator design, which eliminates the need for decoding the video latents.
Our model is capable of producing high-quality videos in merely one-step, with the flexibility to perform multi-step refinement.
arXiv Detail & Related papers (2024-09-17T17:16:37Z) - T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback [111.40967379458752]
We introduce T2V-Turbo, which integrates feedback from a mixture of differentiable reward models into the consistency distillation process of a pre-trained T2V model.
Remarkably, the 4-step generations from our T2V-Turbo achieve the highest total score on VBench, even surpassing Gen-2 and Pika.
arXiv Detail & Related papers (2024-05-29T04:26:17Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Consistency Models [89.68380014789861]
We propose a new family of models that generate high quality samples by directly mapping noise to data.
They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality.
They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training.
arXiv Detail & Related papers (2023-03-02T18:30:16Z) - Reinforced Multi-Teacher Selection for Knowledge Distillation [54.72886763796232]
knowledge distillation is a popular method for model compression.
Current methods assign a fixed weight to a teacher model in the whole distillation.
Most of the existing methods allocate an equal weight to every teacher model.
In this paper, we observe that, due to the complexity of training examples and the differences in student model capability, learning differentially from teacher models can lead to better performance of student models distilled.
arXiv Detail & Related papers (2020-12-11T08:56:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.