Related papers: WUSH: Near-Optimal Adaptive Transforms for LLM Quantization

WUSH: Near-Optimal Adaptive Transforms for LLM Quantization

URL: http://arxiv.org/abs/2512.00956v1
Date: Sun, 30 Nov 2025 16:17:34 GMT
Title: WUSH: Near-Optimal Adaptive Transforms for LLM Quantization
Authors: Jiale Chen, Vage Egiazarian, Torsten Hoefler, Dan Alistarh,
Abstract summary: Quantization to low bitwidth is a standard approach for deploying large language models.<n>A few extreme weights and activations stretch the dynamic range and reduce the effective resolution of the quantizer.<n>We derive, for the first time, closed-form optimal linear blockwise transforms for joint weight-activation quantization.
Score: 52.77441224845925
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Quantization to low bitwidth is a standard approach for deploying large language models, however, a few extreme weights and activations stretch the dynamic range and reduce the effective resolution of the quantizer. A common mitigation approach is to apply some fixed orthogonal transforms, such as Hadamard matrices, before quantization, which typically reduces the dynamic range. Yet, these transforms ignore the statistics of the data, and their optimality is currently not understood. In this work, we derive, for the first time, closed-form optimal linear blockwise transforms for joint weight-activation quantization using standard data-free quantizers for common numerical formats. Specifically, we provide derivations of the optimal adaptive (data-aware) transforms for round-to-nearest (RTN), AbsMax-scaled block quantizers for both integer and floating-point formats. The resulting construction, which we call WUSH, combines a Hadamard backbone with a data-dependent component based on second-order moments, yielding a non-orthogonal transform that is provably optimal under mild assumptions and remains structured for efficient implementation. Preliminary experimental results show that our approach consistently improves upon the Hadamard transform for common formats.

Related papers

Variational Entropic Optimal Transport [67.76725267984578]
We propose Variational Entropic Optimal Transport (VarEOT) for domain translation problems.<n>VarEOT is based on an exact variational reformulation of the log-partition $log mathbbE[exp(cdot)$ as a tractable generalization over an auxiliary positive normalizer.<n> Experiments on synthetic data and unpaired image-to-image translation demonstrate competitive or improved translation quality.
arXiv Detail & Related papers (2026-02-02T15:48:44Z)
Tuning-Free Structured Sparse Recovery of Multiple Measurement Vectors using Implicit Regularization [13.378211527081582]
We introduce a tuning-free framework to recover sparse signals in multiple measurement vectors.<n>We show that the optimization dynamics exhibit a "momentum-like" effect, causing the norms of rows in the true support to grow significantly faster than others.
arXiv Detail & Related papers (2025-12-03T02:53:11Z)
Adaptive Layer-Wise Transformations for Post-Training Quantization of Large Language Models [47.54958360970588]
Large language models require significant computational resources for deployment.<n>Main obstacle to effective quantization lies in systematic outliers in activations and weights.<n>We propose an adaptive transformation selection framework that systematically determines optimal transformations on a per-layer basis.
arXiv Detail & Related papers (2025-11-21T22:01:58Z)
STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization [21.93314755695813]
Quantization is the key method for reducing inference latency, power and memory footprint of generative AI models.<n>We propose textitSequence Transformation and Mixed Precision (STaMP) quantization.
arXiv Detail & Related papers (2025-10-30T17:53:42Z)
Neural Optimal Transport Meets Multivariate Conformal Prediction [58.43397908730771]
We propose a framework for conditional vectorile regression (CVQR)<n>CVQR combines neural optimal transport with quantized optimization, and apply it to predictions.
arXiv Detail & Related papers (2025-09-29T19:50:19Z)
Numerical Optimization for Tensor Disentanglement [7.88541926763416]
This paper focuses on tensor disentangling, the task of identifying transformations that reduce bond dimensions by exploiting gauge freedom in the network.<n>We formulate this problem as an optimization problem over orthogonal matrices acting on a single tensor's indices, aiming to minimize the rank of its matricized form.<n>To seek the often unknown optimal rank, we introduce a binary search strategy integrated with the disentangling procedure.
arXiv Detail & Related papers (2025-08-26T20:17:48Z)
HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations [17.975720202894905]
Post-Training Quantization (PTQ) offers a promising solution by reducing the bitwidth of matrix operations.<n>We propose HadaNorm, a novel linear transformation that extends existing approaches by both normalizing channels activations and applying Hadamard transforms.<n>We demonstrate that HadaNorm consistently reduces quantization error across the various components of transformer blocks, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2025-06-11T16:54:34Z)
Variationally optimizing infinite projected entangled-pair states at large bond dimensions: A split corner transfer matrix renormalization group approach [0.2796197251957244]
We introduce an alternative "split-CTMRG" algorithm, which maintains separate PEPS layers and leverages new environment tensors, reducing computational complexity while preserving accuracy.<n> Benchmarks on quantum lattice models demonstrate substantial speedups for variational energy optimization, rendering this method valuable for large-scale PEPS simulations.
arXiv Detail & Related papers (2025-02-14T16:59:33Z)
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem [71.3332971315821]
We present a "line theoremarity" establishing a direct relationship between the layer-wise $ell$ reconstruction error and the model perplexity increase due to quantization. This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels.
arXiv Detail & Related papers (2024-11-26T15:35:44Z)
Variable-size Symmetry-based Graph Fourier Transforms for image compression [65.7352685872625]
We propose a new family of Symmetry-based Graph Fourier Transforms of variable sizes into a coding framework. Our proposed algorithm generates symmetric graphs on the grid by adding specific symmetrical connections between nodes. Experiments show that SBGFTs outperform the primary transforms integrated in the explicit Multiple Transform Selection.
arXiv Detail & Related papers (2024-11-24T13:00:44Z)
Exact Backpropagation in Binary Weighted Networks with Group Weight Transformations [0.0]
Quantization based model compression serves as high performing and fast approach for inference. Models that constrain the weights to binary values enable efficient implementation of the ubiquitous dot product.
arXiv Detail & Related papers (2021-07-03T10:29:34Z)
Understanding Implicit Regularization in Over-Parameterized Single Index Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model. We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.