Related papers: QVD: Post-training Quantization for Video Diffusion Models

QVD: Post-training Quantization for Video Diffusion Models

URL: http://arxiv.org/abs/2407.11585v2
Date: Wed, 17 Jul 2024 05:27:04 GMT
Title: QVD: Post-training Quantization for Video Diffusion Models
Authors: Shilong Tian, Hong Chen, Chengtao Lv, Yu Liu, Jinyang Guo, Xianglong Liu, Shengxi Li, Hao Yang, Tao Xie,
Abstract summary: Post-training quantization (PTQ) is an effective technique to reduce memory footprint and improve computational efficiency. We introduce the first PTQ strategy tailored for video diffusion models, dubbed QVD. We achieve near-lossless performance degradation on W8A8, outperforming the current methods by 205.12 in FVD.
Score: 33.13078954859106
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, video diffusion models (VDMs) have garnered significant attention due to their notable advancements in generating coherent and realistic video content. However, processing multiple frame features concurrently, coupled with the considerable model size, results in high latency and extensive memory consumption, hindering their broader application. Post-training quantization (PTQ) is an effective technique to reduce memory footprint and improve computational efficiency. Unlike image diffusion, we observe that the temporal features, which are integrated into all frame features, exhibit pronounced skewness. Furthermore, we investigate significant inter-channel disparities and asymmetries in the activation of video diffusion models, resulting in low coverage of quantization levels by individual channels and increasing the challenge of quantization. To address these issues, we introduce the first PTQ strategy tailored for video diffusion models, dubbed QVD. Specifically, we propose the High Temporal Discriminability Quantization (HTDQ) method, designed for temporal features, which retains the high discriminability of quantized features, providing precise temporal guidance for all video frames. In addition, we present the Scattered Channel Range Integration (SCRI) method which aims to improve the coverage of quantization levels across individual channels. Experimental validations across various models, datasets, and bit-width settings demonstrate the effectiveness of our QVD in terms of diverse metrics. In particular, we achieve near-lossless performance degradation on W8A8, outperforming the current methods by 205.12 in FVD.

Related papers

LSGQuant: Layer-Sensitivity Guided Quantization for One-Step Diffusion Real-World Video Super-Resolution [52.627063566555194]
We introduce LSGQuant, a layer-sensitivity guided quantizing approach for one-step diffusion-based real-world VSR.<n>Our method incorporates a Dynamic Range Adaptive Quantizer (DRAQ) to fit video token activations.<n>Our method has nearly performance to origin model with full-precision and significantly exceeds existing quantization techniques.
arXiv Detail & Related papers (2026-02-03T06:53:19Z)
MTC-VAE: Multi-Level Temporal Compression with Content Awareness [54.85288415164888]
Latent Video Diffusion Models (LVDMs) rely on Variational Autoencoders (VAEs) to compress videos into compact latent representations.<n>We present a technique to convert fixed compression rate VAEs into models that support multi-level temporal compression.
arXiv Detail & Related papers (2026-02-01T17:08:02Z)
Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers [31.95947876513405]
We present QDi-VT, a quantization framework specifically designed for video DiT models.<n>From the quantization perspective, we propose the Token-aware Quantization Estor (TQE), which compensates for quantization errors in both the token and feature dimensions.<n>Our W3A6 QDi-VT achieves a scene consistency of 23.40, setting a new benchmark and outperforming current state-of-the-art quantization methods by 1.9$times$.
arXiv Detail & Related papers (2025-05-28T09:33:52Z)
PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement [83.89668902758243]
Multi-frame video enhancement tasks aim to improve the spatial and temporal resolution and quality of video sequences.<n>We propose Progressive Multi-Frame Quantization for Video Enhancement (PMQ-VE)<n>This framework features a coarse-to-fine two-stage process: Backtracking-based Multi-Frame Quantization (BMFQ) and Progressive Multi-Teacher Distillation (PMTD)
arXiv Detail & Related papers (2025-05-18T07:10:40Z)
One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step. To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration. Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z)
Diffusion-based Perceptual Neural Video Compression with Temporal Diffusion Information Reuse [45.134271969594614]
DiffVC is a diffusion-based perceptual neural video compression framework. It integrates foundational diffusion model with the video conditional coding paradigm. We show that our proposed solution delivers excellent performance in both perception metrics and visual quality.
arXiv Detail & Related papers (2025-01-23T10:23:04Z)
TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models [49.65286242048452]
We propose a novel method dubbed Timestep-Channel Adaptive Quantization for Diffusion Models (TCAQ-DM) The proposed method substantially outperforms the state-of-the-art approaches in most cases.
arXiv Detail & Related papers (2024-12-21T16:57:54Z)
VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression [59.14355576912495]
NeRF-based video has revolutionized visual media by delivering photorealistic Free-Viewpoint Video (FVV) experiences. The substantial data volumes pose significant challenges for storage and transmission. We propose VRVVC, a novel end-to-end joint variable-rate framework for video compression.
arXiv Detail & Related papers (2024-12-16T01:28:04Z)
Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion [9.402892455344677]
We propose an efficient quantization framework for Stable Diffusion models (SDM)<n>Our framework simultaneously maintains training-inference consistency and ensures optimization stability.<n>Our method demonstrates superior performance over state-of-the-art approaches with shorter training times.
arXiv Detail & Related papers (2024-12-09T17:00:20Z)
Accelerating Video Diffusion Models via Distribution Matching [26.475459912686986]
This work introduces a novel framework for diffusion distillation and distribution matching. Our approach focuses on distilling pre-trained diffusion models into a more efficient few-step generator. By leveraging a combination of video GAN loss and a novel 2D score distribution matching loss, we demonstrate the potential to generate high-quality video frames.
arXiv Detail & Related papers (2024-12-08T11:36:32Z)
Solving Video Inverse Problems Using Image Diffusion Models [58.464465016269614]
We introduce an innovative video inverse solver that leverages only image diffusion models. Our method treats the time dimension of a video as the batch dimension image diffusion models. We also introduce a batch-consistent sampling strategy that encourages consistency across batches.
arXiv Detail & Related papers (2024-09-04T09:48:27Z)
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation [85.29772293776395]
We introduce FRESCO, intra-frame correspondence alongside inter-frame correspondence to establish a more robust spatial-temporal constraint. This enhancement ensures a more consistent transformation of semantically similar content across frames. Our approach involves an explicit update of features to achieve high spatial-temporal consistency with the input video.
arXiv Detail & Related papers (2024-03-19T17:59:18Z)
Memory-Efficient Fine-Tuning for Quantized Diffusion Model [12.875837358532422]
We introduce TuneQDM, a memory-efficient fine-tuning method for quantized diffusion models. Our method consistently outperforms the baseline in both single-/multi-subject generations.
arXiv Detail & Related papers (2024-01-09T03:42:08Z)
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution [65.91317390645163]
Upscale-A-Video is a text-guided latent diffusion framework for video upscaling. It ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences. It also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation.
arXiv Detail & Related papers (2023-12-11T18:54:52Z)
ResQ: Residual Quantization for Video Perception [18.491197847596283]
We propose a novel quantization scheme for video networks coined as Residual Quantization. We extend our model to dynamically adjust the bit-width proportional to the amount of changes in the video.
arXiv Detail & Related papers (2023-08-18T12:41:10Z)
Temporal Dynamic Quantization for Diffusion Models [18.184163233551292]
We introduce a novel quantization method that dynamically adjusts the quantization interval based on time step information. Unlike conventional dynamic quantization techniques, our approach has no computational overhead during inference. Our experiments demonstrate substantial improvements in output quality with the quantized diffusion model across various datasets.
arXiv Detail & Related papers (2023-06-04T09:49:43Z)
Video Probabilistic Diffusion Models in Projected Latent Space [75.4253202574722]
We propose a novel generative model for videos, coined projected latent video diffusion models (PVDM) PVDM learns a video distribution in a low-dimensional latent space and thus can be efficiently trained with high-resolution videos under limited resources.
arXiv Detail & Related papers (2023-02-15T14:22:34Z)
Q-Diffusion: Quantizing Diffusion Models [52.978047249670276]
Post-training quantization (PTQ) is considered a go-to compression method for other tasks. We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture. We show that our proposed method is able to quantize full-precision unconditional diffusion models into 4-bit while maintaining comparable performance.
arXiv Detail & Related papers (2023-02-08T19:38:59Z)
VIDM: Video Implicit Diffusion Models [75.90225524502759]
Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images. We propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition. We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization.
arXiv Detail & Related papers (2022-12-01T02:58:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.