HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations
- URL: http://arxiv.org/abs/2506.09932v2
- Date: Thu, 10 Jul 2025 10:03:57 GMT
- Title: HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations
- Authors: Marco Federici, Riccardo Del Chiaro, Boris van Breugel, Paul Whatmough, Markus Nagel,
- Abstract summary: Post-Training Quantization (PTQ) offers a promising solution by reducing the bitwidth of matrix operations.<n>We propose HadaNorm, a novel linear transformation that extends existing approaches by both normalizing channels activations and applying Hadamard transforms.<n>We demonstrate that HadaNorm consistently reduces quantization error across the various components of transformer blocks, outperforming state-of-the-art methods.
- Score: 17.975720202894905
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models represent the cutting edge in image generation, but their high memory and computational demands hinder deployment on resource-constrained devices. Post-Training Quantization (PTQ) offers a promising solution by reducing the bitwidth of matrix operations. However, standard PTQ methods struggle with outliers, and achieving higher compression often requires transforming model weights and activations before quantization. In this work, we propose HadaNorm, a novel linear transformation that extends existing approaches by both normalizing channels activations and applying Hadamard transforms to effectively mitigate outliers and enable aggressive activation quantization. We demonstrate that HadaNorm consistently reduces quantization error across the various components of transformer blocks, outperforming state-of-the-art methods.
Related papers
- LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Text-to-Image Generation [34.14174796390669]
Post-training quantization (PTQ) is a promising solution to reduce memory usage and accelerate inference.<n>Existing PTQ methods suffer from severe performance degradation under extreme low-bit settings.<n>We propose LRQ-DiT, an efficient and accurate PTQ framework.
arXiv Detail & Related papers (2025-08-05T14:16:11Z) - MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation [74.34220141721231]
We present MPQ-DMv2, an improved textbfMixed textbfPrecision textbfQuantization framework for extremely low-bit textbfDiffusion textbfModels.
arXiv Detail & Related papers (2025-07-06T08:16:50Z) - FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation [55.12070409045766]
Post-training quantization (PTQ) has stood out as a cost-effective and promising model compression paradigm in recent years.<n>Current PTQ methods for Vision Transformers (ViTs) still suffer from significant accuracy degradation, especially under low-bit quantization.
arXiv Detail & Related papers (2025-06-13T07:57:38Z) - MAP Image Recovery with Guarantees using Locally Convex Multi-Scale Energy (LC-MUSE) Model [12.218356507147583]
We propose a multi-scale deep energy model that is strongly convex in the local neighbourhood around the data manifold.<n>We use the learned energy model in image-based inverse problems, where the formulation offers several desirable properties.<n>In the context of parallel Magnetic Resonance (MR) image reconstruction, we show that the proposed method performs better than the state-of-the-art convex regularizers.
arXiv Detail & Related papers (2025-02-05T16:00:55Z) - TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models [49.65286242048452]
We propose a novel method dubbed Timestep-Channel Adaptive Quantization for Diffusion Models (TCAQ-DM)<n>The proposed method substantially outperforms the state-of-the-art approaches in most cases.
arXiv Detail & Related papers (2024-12-21T16:57:54Z) - PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution [95.98801201266099]
Diffusion-based image super-resolution (SR) models have shown superior performance at the cost of multiple denoising steps.<n>We propose a novel post-training quantization approach with adaptive scale in one-step diffusion (OSD) image SR, PassionSR.<n>Our PassionSR achieves significant advantages over recent leading low-bit quantization methods for image SR.
arXiv Detail & Related papers (2024-11-26T04:49:42Z) - Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers [45.762142897697366]
Post-Training Quantization (PTQ) emerges as a promising solution, enabling model compression and accelerated inference for pretrained models.
Research on DiT quantization remains sparse, and existing PTQ frameworks tend to suffer from biased quantization, leading to notable performance degradation.
We propose Q-DiT, a novel approach that seamlessly integrates two key techniques: automatic quantization granularity allocation to handle the significant variance of weights and activations across input channels, and sample-wise dynamic activation quantization to adaptively capture activation changes across both timesteps and samples.
arXiv Detail & Related papers (2024-06-25T07:57:27Z) - An Analysis on Quantizing Diffusion Transformers [19.520194468481655]
Post Training Quantization (PTQ) offers an immediate remedy for a smaller storage size and more memory-efficient computation during inferencing.
We propose a single-step sampling calibration on activations and adapt group-wise quantization on weights for low-bit quantization.
arXiv Detail & Related papers (2024-06-16T23:18:35Z) - RepQuant: Towards Accurate Post-Training Quantization of Large
Transformer Models via Scale Reparameterization [8.827794405944637]
Post-training quantization (PTQ) is a promising solution for compressing large transformer models.
Existing PTQ methods typically exhibit non-trivial performance loss.
We propose RepQuant, a novel PTQ framework with quantization-inference decoupling paradigm.
arXiv Detail & Related papers (2024-02-08T12:35:41Z) - CBQ: Cross-Block Quantization for Large Language Models [66.82132832702895]
Post-training quantization (PTQ) has played a key role in compressing large language models (LLMs) with ultra-low costs.<n>We propose CBQ, a cross-block reconstruction-based PTQ method for LLMs.<n> CBQ employs a cross-block dependency using a reconstruction scheme, establishing long-range dependencies across multiple blocks to minimize error accumulation.
arXiv Detail & Related papers (2023-12-13T07:56:27Z) - Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing [49.800746112114375]
We propose a novel post-training quantization method (Progressive and Relaxing) for text-to-image diffusion models.
We are the first to achieve quantization for Stable Diffusion XL while maintaining the performance.
arXiv Detail & Related papers (2023-11-10T09:10:09Z) - LLIC: Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression [27.02281402358164]
We propose Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression.
We introduce a few large kernelbased depth-wise convolutions to reduce more redundancy while maintaining modest complexity.
Our LLIC models achieve state-of-the-art performances and better trade-offs between performance and complexity.
arXiv Detail & Related papers (2023-04-19T11:19:10Z) - JPEG Artifact Correction using Denoising Diffusion Restoration Models [110.1244240726802]
We build upon Denoising Diffusion Restoration Models (DDRM) and propose a method for solving some non-linear inverse problems.
We leverage the pseudo-inverse operator used in DDRM and generalize this concept for other measurement operators.
arXiv Detail & Related papers (2022-09-23T23:47:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.