GPD: Guided Progressive Distillation for Fast and High-Quality Video Generation
- URL: http://arxiv.org/abs/2602.01814v1
- Date: Mon, 02 Feb 2026 08:47:33 GMT
- Title: GPD: Guided Progressive Distillation for Fast and High-Quality Video Generation
- Authors: Xiao Liang, Yunzhu Zhang, Linchao Zhu,
- Abstract summary: We propose Guided Progressive Distillation (GPD), a framework that accelerates the diffusion process for fast and high-quality video generation.<n>GPD reduces the number of sampling steps from 48 to 6 while maintaining competitive visual quality on VBench.
- Score: 48.965157828225074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have achieved remarkable success in video generation; however, the high computational cost of the denoising process remains a major bottleneck. Existing approaches have shown promise in reducing the number of diffusion steps, but they often suffer from significant quality degradation when applied to video generation. We propose Guided Progressive Distillation (GPD), a framework that accelerates the diffusion process for fast and high-quality video generation. GPD introduces a novel training strategy in which a teacher model progressively guides a student model to operate with larger step sizes. The framework consists of two key components: (1) an online-generated training target that reduces optimization difficulty while improving computational efficiency, and (2) frequency-domain constraints in the latent space that promote the preservation of fine-grained details and temporal dynamics. Applied to the Wan2.1 model, GPD reduces the number of sampling steps from 48 to 6 while maintaining competitive visual quality on VBench. Compared with existing distillation methods, GPD demonstrates clear advantages in both pipeline simplicity and quality preservation.
Related papers
- LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration [12.183601881545039]
Diffusion models have achieved remarkable success in image and video generation tasks.<n>However, the high computational demands of Diffusion Transformers pose a significant challenge to their practical deployment.<n>We propose a LEarnable Stage-Aware (LESA) predictor framework based on two-stage training.
arXiv Detail & Related papers (2026-02-24T02:53:28Z) - Towards One-step Causal Video Generation via Adversarial Self-Distillation [71.30373662465648]
Recent hybrid video generation models combine autoregressive temporal dynamics with diffusion-based spatial denoising.<n>Our framework produces a single distilled model that flexibly supports multiple inference-step settings.
arXiv Detail & Related papers (2025-11-03T10:12:47Z) - VividFace: High-Quality and Efficient One-Step Diffusion For Video Face Enhancement [51.83206132052461]
Video Face Enhancement (VFE) seeks to reconstruct high-quality facial regions from degraded video sequences.<n>Current methods that rely on video super-resolution and generative frameworks face three fundamental challenges.<n>We propose VividFace, a novel and efficient one-step diffusion framework for video face enhancement.
arXiv Detail & Related papers (2025-09-28T02:39:48Z) - SwiftVideo: A Unified Framework for Few-Step Video Generation through Trajectory-Distribution Alignment [76.60024640625478]
Diffusion-based or flow-based models have achieved significant progress in video synthesis but require multiple iterative sampling steps.<n>We propose a unified and stable distillation framework that combines the advantages of trajectory-preserving and distribution-matching strategies.<n>Our method maintains high-quality video generation while substantially reducing the number of inference steps.
arXiv Detail & Related papers (2025-08-08T07:26:34Z) - PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement [83.89668902758243]
Multi-frame video enhancement tasks aim to improve the spatial and temporal resolution and quality of video sequences.<n>We propose Progressive Multi-Frame Quantization for Video Enhancement (PMQ-VE)<n>This framework features a coarse-to-fine two-stage process: Backtracking-based Multi-Frame Quantization (BMFQ) and Progressive Multi-Teacher Distillation (PMTD)
arXiv Detail & Related papers (2025-05-18T07:10:40Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide [48.22321420680046]
VideoGuide is a novel framework that enhances the temporal consistency of pretrained text-to-video (T2V) models.<n>It improves temporal quality by interpolating the guiding model's denoised samples into the sampling model's denoising process.<n>The proposed method brings about significant improvement in temporal consistency and image fidelity.
arXiv Detail & Related papers (2024-10-06T05:46:17Z) - OSV: One Step is Enough for High-Quality Image to Video Generation [44.09826880566572]
We introduce a two-stage training framework that effectively combines consistency distillation and GAN training.<n>We also propose a novel video discriminator design, which eliminates the need for decoding the video latents.<n>Our model is capable of producing high-quality videos in merely one-step, with the flexibility to perform multi-step refinement.
arXiv Detail & Related papers (2024-09-17T17:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.