Related papers: QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution

QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution

URL: http://arxiv.org/abs/2508.04485v1
Date: Wed, 06 Aug 2025 14:35:59 GMT
Title: QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution
Authors: Bowen Chai, Zheng Chen, Libo Zhu, Wenbo Li, Yong Guo, Yulun Zhang,
Abstract summary: We propose a low-bit quantization model for real-world video super-resolution (VSR)<n>We use a calibration dataset to measure both spatial and temporal complexity for each layer.<n>We refine the FP and low-bit branches to achieve simultaneous optimization.
Score: 53.13952833016505
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models have shown superior performance in real-world video super-resolution (VSR). However, the slow processing speeds and heavy resource consumption of diffusion models hinder their practical application and deployment. Quantization offers a potential solution for compressing the VSR model. Nevertheless, quantizing VSR models is challenging due to their temporal characteristics and high fidelity requirements. To address these issues, we propose QuantVSR, a low-bit quantization model for real-world VSR. We propose a spatio-temporal complexity aware (STCA) mechanism, where we first utilize the calibration dataset to measure both spatial and temporal complexities for each layer. Based on these statistics, we allocate layer-specific ranks to the low-rank full-precision (FP) auxiliary branch. Subsequently, we jointly refine the FP and low-bit branches to achieve simultaneous optimization. In addition, we propose a learnable bias alignment (LBA) module to reduce the biased quantization errors. Extensive experiments on synthetic and real-world datasets demonstrate that our method obtains comparable performance with the FP model and significantly outperforms recent leading low-bit quantization methods. Code is available at: https://github.com/bowenchai/QuantVSR.

Related papers

MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation [74.34220141721231]
We present MPQ-DMv2, an improved textbfMixed textbfPrecision textbfQuantization framework for extremely low-bit textbfDiffusion textbfModels.
arXiv Detail & Related papers (2025-07-06T08:16:50Z)
SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution [55.14432034345353]
We study key design principles for latter cascaded video super-resolution models, which are underexplored currently.<n>First, we propose two strategies to generate training pairs that better mimic the output characteristics of the base model, ensuring alignment between the VSR model and its upstream generator.<n>Second, we provide critical insights into VSR model behavior through systematic analysis of (1) timestep sampling strategies, (2) noise augmentation effects on low-resolution (LR) inputs.
arXiv Detail & Related papers (2025-06-24T17:57:26Z)
Low-Resource Video Super-Resolution using Memory, Wavelets, and Deformable Convolutions [3.018928786249079]
Video super-resolution (VSR) remains a formidable challenge in its adoption for deployment on resource-constrained edge devices.<n>We propose a novel lightweight and parameter-efficient neural architecture for VSR that achieves state-of-the-art reconstruction accuracy with just 2.3 million parameters.
arXiv Detail & Related papers (2025-02-03T20:46:15Z)
TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models [49.65286242048452]
We propose a novel method dubbed Timestep-Channel Adaptive Quantization for Diffusion Models (TCAQ-DM)<n>The proposed method substantially outperforms the state-of-the-art approaches in most cases.
arXiv Detail & Related papers (2024-12-21T16:57:54Z)
Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging [3.502427552446068]
Deep learning models enable real-time inference, but can be computationally demanding due to complex architectures and large matrix operations. This makes DL models ill-suited for direct implementation on field-programmable gate array (FPGA)-based camera hardware. In this work, we focus on compressing recurrent neural networks (RNNs), which are well-suited for FLI time-series data processing, to enable deployment on resource-constrained FPGA boards.
arXiv Detail & Related papers (2024-10-01T17:23:26Z)
Universality of Real Minimal Complexity Reservoir [0.358439716487063]
Reservoir Computing (RC) models are distinguished by their fixed, non-trainable input layer and dynamically coupled reservoir. Simple Cycle Reservoirs (SCR) represent a specialized class of RC models with a highly constrained reservoir architecture. SCRs operating in real domain are universal approximators of time-invariant dynamic filters with fading memory.
arXiv Detail & Related papers (2024-08-15T10:44:33Z)
Temporal Feature Matters: A Framework for Diffusion Model Quantization [105.3033493564844]
Diffusion models rely on the time-step for the multi-round denoising.<n>We introduce a novel quantization framework that includes three strategies.<n>This framework preserves most of the temporal information and ensures high-quality end-to-end generation.
arXiv Detail & Related papers (2024-07-28T17:46:15Z)
EDA-DM: Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models [8.742501879586309]
Quantization can effectively reduce model complexity, and post-training quantization (PTQ) is highly promising for compressing and accelerating diffusion models.<n>Existing PTQ methods suffer from distribution mismatch issues at both calibration sample level and reconstruction output level.<n>We propose EDA-DM, a standardized PTQ method that efficiently addresses the above issues.
arXiv Detail & Related papers (2024-01-09T14:42:49Z)
TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models [52.454274602380124]
Diffusion models heavily depend on the time-step $t$ to achieve satisfactory multi-round denoising. We propose a Temporal Feature Maintenance Quantization (TFMQ) framework building upon a Temporal Information Block. Powered by the pioneering block design, we devise temporal information aware reconstruction (TIAR) and finite set calibration (FSC) to align the full-precision temporal features.
arXiv Detail & Related papers (2023-11-27T12:59:52Z)
MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression [30.71965784982577]
We introduce MEM++, which captures diverse range of correlations inherent in the latent representation. MEM++ achieves state-of-the-art performance, reducing BD-rate by 13.39% on the Kodak dataset compared to VTM-17.0 in PSNR. MLIC++ exhibits linear GPU memory consumption with resolution, making it highly suitable for high-resolution image coding.
arXiv Detail & Related papers (2023-07-28T09:11:37Z)
Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability. To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance. To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.