Q-DiT4SR: Exploration of Detail-Preserving Diffusion Transformer Quantization for Real-World Image Super-Resolution
- URL: http://arxiv.org/abs/2602.01273v2
- Date: Tue, 10 Feb 2026 07:16:36 GMT
- Title: Q-DiT4SR: Exploration of Detail-Preserving Diffusion Transformer Quantization for Real-World Image Super-Resolution
- Authors: Xun Zhang, Kaicheng Yang, Hongliang Lu, Haotong Qin, Yong Guo, Yulun Zhang,
- Abstract summary: We propose Q-DiT4SR, the first PTQ framework specifically tailored for DiT-based Real-ISR.<n>We propose H-SVD, a hierarchical SVD that integrates a global low-rank branch with a local block-wise rank-1 branch under a matched parameter budget.<n> Experiments on multiple real-world datasets demonstrate that our Q-DiT4SR achieves SOTA performance under both W4A6 and W4A4 settings.
- Score: 47.12618295041499
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, Diffusion Transformers (DiTs) have emerged in Real-World Image Super-Resolution (Real-ISR) to generate high-quality textures, yet their heavy inference burden hinders real-world deployment. While Post-Training Quantization (PTQ) is a promising solution for acceleration, existing methods in super-resolution mostly focus on U-Net architectures, whereas generic DiT quantization is typically designed for text-to-image tasks. Directly applying these methods to DiT-based super-resolution models leads to severe degradation of local textures. Therefore, we propose Q-DiT4SR, the first PTQ framework specifically tailored for DiT-based Real-ISR. We propose H-SVD, a hierarchical SVD that integrates a global low-rank branch with a local block-wise rank-1 branch under a matched parameter budget. We further propose Variance-aware Spatio-Temporal Mixed Precision: VaSMP allocates cross-layer weight bit-widths in a data-free manner based on rate-distortion theory, while VaTMP schedules intra-layer activation precision across diffusion timesteps via dynamic programming (DP) with minimal calibration. Experiments on multiple real-world datasets demonstrate that our Q-DiT4SR achieves SOTA performance under both W4A6 and W4A4 settings. Notably, the W4A4 quantization configuration reduces model size by 5.8$\times$ and computational operations by over 60$\times$. Our code and models will be available at https://github.com/xunzhang1128/Q-DiT4SR.
Related papers
- TreeQ: Pushing the Quantization Boundary of Diffusion Transformer via Tree-Structured Mixed-Precision Search [35.93578975066986]
Diffusion Transformers (DiTs) have emerged as a highly scalable and effective backbone for image generation.<n>Mixed-Precision Quantization (MPQ) has demonstrated remarkable success in advancing U-Net quantization to sub-4bit settings.<n>We propose TreeQ, a unified framework addressing key challenges in DiT quantization.
arXiv Detail & Related papers (2025-12-06T08:59:12Z) - Post-Training Quantization via Residual Truncation and Zero Suppression for Diffusion Models [10.000323762676633]
Diffusion models achieve high-quality image generation but face deployment challenges due to their high computational requirements.<n>We propose Quantization via Residual Truncation and Zero Suppression (QuaRTZ), a 4-bit PTQ scheme for diffusion models.<n>Our approach reduces rounding errors and improves quantization efficiency by balancing outlier preservation and LSB precision.
arXiv Detail & Related papers (2025-09-30T15:55:42Z) - RobuQ: Pushing DiTs to W1.58A2 via Robust Activation Quantization [33.96616374712551]
Diffusion Transformers (DiTs) have emerged as a powerful backbone for image generation.<n>Their practical deployment is hindered by substantial computational and memory costs.<n>We propose a systematic QAT framework for DiTs, named RobuQ.
arXiv Detail & Related papers (2025-09-28T02:35:12Z) - Quantized Visual Geometry Grounded Transformer [67.15451442018258]
This paper proposes the first Quantization framework for VGGTs, namely QuantVGGT.<n>We introduce Dual-Smoothed Fine-Grained Quantization, which integrates pre-global Hadamard rotation and post-local channel smoothing.<n>We also design Noise-Filtered Diverse Sampling, which filters outliers via deep-layer statistics.
arXiv Detail & Related papers (2025-09-25T15:17:11Z) - HiMat: DiT-based Ultra-High Resolution SVBRDF Generation [26.081964370337943]
HiMat is a diffusion-based framework tailored for efficient and diverse 4K SVBRDF generation.<n>CrossStitch is a lightweight convolutional module that enforces cross-map consistency without incurring the cost of global attention.
arXiv Detail & Related papers (2025-08-09T15:16:58Z) - QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution [53.13952833016505]
We propose a low-bit quantization model for real-world video super-resolution (VSR)<n>We use a calibration dataset to measure both spatial and temporal complexity for each layer.<n>We refine the FP and low-bit branches to achieve simultaneous optimization.
arXiv Detail & Related papers (2025-08-06T14:35:59Z) - PTQ4DiT: Post-training Quantization for Diffusion Transformers [52.902071948957186]
Post-training Quantization (PTQ) has emerged as a fast and data-efficient solution that can significantly reduce computation and memory footprint.
We propose PTQ4DiT, a specifically designed PTQ method for DiTs.
We demonstrate that our PTQ4DiT successfully quantizes DiTs to 8-bit precision while preserving comparable generation ability.
arXiv Detail & Related papers (2024-05-25T02:02:08Z) - Q-DETR: An Efficient Low-Bit Quantized Detection Transformer [50.00784028552792]
We find that the bottlenecks of Q-DETR come from the query information distortion through our empirical analyses.
We formulate our DRD as a bi-level optimization problem, which can be derived by generalizing the information bottleneck (IB) principle to the learning of Q-DETR.
We introduce a new foreground-aware query matching scheme to effectively transfer the teacher information to distillation-desired features to minimize the conditional information entropy.
arXiv Detail & Related papers (2023-04-01T08:05:14Z) - CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution [55.50793823060282]
We propose a novel Content-Aware Dynamic Quantization (CADyQ) method for image super-resolution (SR) networks.
CADyQ allocates optimal bits to local regions and layers adaptively based on the local contents of an input image.
The pipeline has been tested on various SR networks and evaluated on several standard benchmarks.
arXiv Detail & Related papers (2022-07-21T07:50:50Z) - FQ-ViT: Fully Quantized Vision Transformer without Retraining [13.82845665713633]
We present a systematic method to reduce the performance degradation and inference complexity of Quantized Transformers.
We are the first to achieve comparable accuracy degradation (1%) on fully quantized Vision Transformers.
arXiv Detail & Related papers (2021-11-27T06:20:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.