Related papers: Accelerating Video Diffusion Models via Distribution Matching

Accelerating Video Diffusion Models via Distribution Matching

URL: http://arxiv.org/abs/2412.05899v1
Date: Sun, 08 Dec 2024 11:36:32 GMT
Title: Accelerating Video Diffusion Models via Distribution Matching
Authors: Yuanzhi Zhu, Hanshu Yan, Huan Yang, Kai Zhang, Junnan Li,
Abstract summary: This work introduces a novel framework for diffusion distillation and distribution matching.<n>Our approach focuses on distilling pre-trained diffusion models into a more efficient few-step generator.<n>By leveraging a combination of video GAN loss and a novel 2D score distribution matching loss, we demonstrate the potential to generate high-quality video frames.
Score: 26.475459912686986
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative models, particularly diffusion models, have made significant success in data synthesis across various modalities, including images, videos, and 3D assets. However, current diffusion models are computationally intensive, often requiring numerous sampling steps that limit their practical application, especially in video generation. This work introduces a novel framework for diffusion distillation and distribution matching that dramatically reduces the number of inference steps while maintaining-and potentially improving-generation quality. Our approach focuses on distilling pre-trained diffusion models into a more efficient few-step generator, specifically targeting video generation. By leveraging a combination of video GAN loss and a novel 2D score distribution matching loss, we demonstrate the potential to generate high-quality video frames with substantially fewer sampling steps. To be specific, the proposed method incorporates a denoising GAN discriminator to distil from the real data and a pre-trained image diffusion model to enhance the frame quality and the prompt-following capabilities. Experimental results using AnimateDiff as the teacher model showcase the method's effectiveness, achieving superior performance in just four sampling steps compared to existing techniques.

Related papers

AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset [55.82208863521353]
We propose AccVideo to reduce the inference steps for accelerating video diffusion models with synthetic dataset. Our model achieves 8.5x improvements in generation speed compared to the teacher model. Compared to previous accelerating methods, our approach is capable of generating videos with higher quality and resolution.
arXiv Detail & Related papers (2025-03-25T08:52:07Z)
One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step. To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration. Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z)
Diffusion Models in Low-Level Vision: A Survey [82.77962165415153]
diffusion model-based solutions have emerged as widely acclaimed for their ability to produce samples of superior quality and diversity. We present three generic diffusion modeling frameworks and explore their correlations with other deep generative models. We summarize extended diffusion models applied in other tasks, including medical, remote sensing, and video scenarios.
arXiv Detail & Related papers (2024-06-17T01:49:27Z)
AdaDiff: Adaptive Step Selection for Fast Diffusion Models [82.78899138400435]
We introduce AdaDiff, a lightweight framework designed to learn instance-specific step usage policies. AdaDiff is optimized using a policy method to maximize a carefully designed reward function. We conduct experiments on three image generation and two video generation benchmarks and demonstrate that our approach achieves similar visual quality compared to the baseline.
arXiv Detail & Related papers (2023-11-24T11:20:38Z)
Fast Inference in Denoising Diffusion Models via MMD Finetuning [23.779985842891705]
We present MMD-DDM, a novel method for fast sampling of diffusion models. Our approach is based on the idea of using the Maximum Mean Discrepancy (MMD) to finetune the learned distribution with a given budget of timesteps. Our findings show that the proposed method is able to produce high-quality samples in a fraction of the time required by widely-used diffusion models.
arXiv Detail & Related papers (2023-01-19T09:48:07Z)
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning [52.72369034247396]
We propose the diffusion glancing transformer, which employs a modality diffusion process and residual glancing sampling. DIFFGLAT achieves better generation accuracy while maintaining fast decoding speed compared with both autoregressive and non-autoregressive models.
arXiv Detail & Related papers (2022-12-20T13:36:25Z)
VIDM: Video Implicit Diffusion Models [75.90225524502759]
Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images. We propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition. We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization.
arXiv Detail & Related papers (2022-12-01T02:58:46Z)
Diffusion Models in Vision: A Survey [80.82832715884597]
A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage. Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens.
arXiv Detail & Related papers (2022-09-10T22:00:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.