Related papers: HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations

HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations

URL: http://arxiv.org/abs/2506.09932v2
Date: Thu, 10 Jul 2025 10:03:57 GMT
Title: HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations
Authors: Marco Federici, Riccardo Del Chiaro, Boris van Breugel, Paul Whatmough, Markus Nagel,
Abstract summary: Post-Training Quantization (PTQ) offers a promising solution by reducing the bitwidth of matrix operations.<n>We propose HadaNorm, a novel linear transformation that extends existing approaches by both normalizing channels activations and applying Hadamard transforms.<n>We demonstrate that HadaNorm consistently reduces quantization error across the various components of transformer blocks, outperforming state-of-the-art methods.
Score: 17.975720202894905
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models represent the cutting edge in image generation, but their high memory and computational demands hinder deployment on resource-constrained devices. Post-Training Quantization (PTQ) offers a promising solution by reducing the bitwidth of matrix operations. However, standard PTQ methods struggle with outliers, and achieving higher compression often requires transforming model weights and activations before quantization. In this work, we propose HadaNorm, a novel linear transformation that extends existing approaches by both normalizing channels activations and applying Hadamard transforms to effectively mitigate outliers and enable aggressive activation quantization. We demonstrate that HadaNorm consistently reduces quantization error across the various components of transformer blocks, outperforming state-of-the-art methods.

Related papers

LSGQuant: Layer-Sensitivity Guided Quantization for One-Step Diffusion Real-World Video Super-Resolution [52.627063566555194]
We introduce LSGQuant, a layer-sensitivity guided quantizing approach for one-step diffusion-based real-world VSR.<n>Our method incorporates a Dynamic Range Adaptive Quantizer (DRAQ) to fit video token activations.<n>Our method has nearly performance to origin model with full-precision and significantly exceeds existing quantization techniques.
arXiv Detail & Related papers (2026-02-03T06:53:19Z)
HQ-DM: Single Hadamard Transformation-Based Quantization-Aware Training for Low-Bit Diffusion Models [5.320690460117234]
HQ-DM is a novel Quantization-Aware Training framework that applies Single Hadamard Transformation to activation matrices.<n>This approach effectively reduces activation outliers while preserving model performance under quantization.<n>For conditional generation on the ImageNet 256x256 dataset using the LDM-4 model, our W4A4 and W4A3 quantization schemes improve the Inception Score by 12.8% and 467.73%, respectively.
arXiv Detail & Related papers (2025-12-05T14:28:40Z)
WUSH: Near-Optimal Adaptive Transforms for LLM Quantization [52.77441224845925]
Quantization to low bitwidth is a standard approach for deploying large language models.<n>A few extreme weights and activations stretch the dynamic range and reduce the effective resolution of the quantizer.<n>We derive, for the first time, closed-form optimal linear blockwise transforms for joint weight-activation quantization.
arXiv Detail & Related papers (2025-11-30T16:17:34Z)
STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization [21.93314755695813]
Quantization is the key method for reducing inference latency, power and memory footprint of generative AI models.<n>We propose textitSequence Transformation and Mixed Precision (STaMP) quantization.
arXiv Detail & Related papers (2025-10-30T17:53:42Z)
LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Text-to-Image Generation [34.14174796390669]
Post-training quantization (PTQ) is a promising solution to reduce memory usage and accelerate inference.<n>Existing PTQ methods suffer from severe performance degradation under extreme low-bit settings.<n>We propose LRQ-DiT, an efficient and accurate PTQ framework.
arXiv Detail & Related papers (2025-08-05T14:16:11Z)
MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation [74.34220141721231]
We present MPQ-DMv2, an improved textbfMixed textbfPrecision textbfQuantization framework for extremely low-bit textbfDiffusion textbfModels.
arXiv Detail & Related papers (2025-07-06T08:16:50Z)
FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation [55.12070409045766]
Post-training quantization (PTQ) has stood out as a cost-effective and promising model compression paradigm in recent years.<n>Current PTQ methods for Vision Transformers (ViTs) still suffer from significant accuracy degradation, especially under low-bit quantization.
arXiv Detail & Related papers (2025-06-13T07:57:38Z)
MAP Image Recovery with Guarantees using Locally Convex Multi-Scale Energy (LC-MUSE) Model [12.218356507147583]
We propose a multi-scale deep energy model that is strongly convex in the local neighbourhood around the data manifold.<n>We use the learned energy model in image-based inverse problems, where the formulation offers several desirable properties.<n>In the context of parallel Magnetic Resonance (MR) image reconstruction, we show that the proposed method performs better than the state-of-the-art convex regularizers.
arXiv Detail & Related papers (2025-02-05T16:00:55Z)
TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models [49.65286242048452]
We propose a novel method dubbed Timestep-Channel Adaptive Quantization for Diffusion Models (TCAQ-DM)<n>The proposed method substantially outperforms the state-of-the-art approaches in most cases.
arXiv Detail & Related papers (2024-12-21T16:57:54Z)
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution [95.98801201266099]
Diffusion-based image super-resolution (SR) models have shown superior performance at the cost of multiple denoising steps.<n>We propose a novel post-training quantization approach with adaptive scale in one-step diffusion (OSD) image SR, PassionSR.<n>Our PassionSR achieves significant advantages over recent leading low-bit quantization methods for image SR.
arXiv Detail & Related papers (2024-11-26T04:49:42Z)
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers [45.762142897697366]
Post-Training Quantization (PTQ) emerges as a promising solution, enabling model compression and accelerated inference for pretrained models. Research on DiT quantization remains sparse, and existing PTQ frameworks tend to suffer from biased quantization, leading to notable performance degradation. We propose Q-DiT, a novel approach that seamlessly integrates two key techniques: automatic quantization granularity allocation to handle the significant variance of weights and activations across input channels, and sample-wise dynamic activation quantization to adaptively capture activation changes across both timesteps and samples.
arXiv Detail & Related papers (2024-06-25T07:57:27Z)
An Analysis on Quantizing Diffusion Transformers [19.520194468481655]
Post Training Quantization (PTQ) offers an immediate remedy for a smaller storage size and more memory-efficient computation during inferencing. We propose a single-step sampling calibration on activations and adapt group-wise quantization on weights for low-bit quantization.
arXiv Detail & Related papers (2024-06-16T23:18:35Z)
RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization [8.827794405944637]
Post-training quantization (PTQ) is a promising solution for compressing large transformer models. Existing PTQ methods typically exhibit non-trivial performance loss. We propose RepQuant, a novel PTQ framework with quantization-inference decoupling paradigm.
arXiv Detail & Related papers (2024-02-08T12:35:41Z)
CBQ: Cross-Block Quantization for Large Language Models [66.82132832702895]
Post-training quantization (PTQ) has played a key role in compressing large language models (LLMs) with ultra-low costs.<n>We propose CBQ, a cross-block reconstruction-based PTQ method for LLMs.<n> CBQ employs a cross-block dependency using a reconstruction scheme, establishing long-range dependencies across multiple blocks to minimize error accumulation.
arXiv Detail & Related papers (2023-12-13T07:56:27Z)
Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing [49.800746112114375]
We propose a novel post-training quantization method (Progressive and Relaxing) for text-to-image diffusion models. We are the first to achieve quantization for Stable Diffusion XL while maintaining the performance.
arXiv Detail & Related papers (2023-11-10T09:10:09Z)
LLIC: Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression [27.02281402358164]
We propose Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression. We introduce a few large kernelbased depth-wise convolutions to reduce more redundancy while maintaining modest complexity. Our LLIC models achieve state-of-the-art performances and better trade-offs between performance and complexity.
arXiv Detail & Related papers (2023-04-19T11:19:10Z)
JPEG Artifact Correction using Denoising Diffusion Restoration Models [110.1244240726802]
We build upon Denoising Diffusion Restoration Models (DDRM) and propose a method for solving some non-linear inverse problems. We leverage the pseudo-inverse operator used in DDRM and generalize this concept for other measurement operators.
arXiv Detail & Related papers (2022-09-23T23:47:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.