Fugu-MT 論文翻訳(概要): PolarQuant: Optimal Gaussian Weight Quantization via Hadamard Rotation for LLM Compression

論文の概要: PolarQuant: Optimal Gaussian Weight Quantization via Hadamard Rotation for LLM Compression

arxiv url: http://arxiv.org/abs/2603.29078v1
Date: Mon, 30 Mar 2026 23:33:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-01 15:25:02.947887
Title: PolarQuant: Optimal Gaussian Weight Quantization via Hadamard Rotation for LLM Compression
Title（参考訳）: PolarQuant: LLM圧縮のためのアダマール回転による最適ガウス量量子化
Authors: Caio Vicentino,
Abstract要約: PolarQuantは、大規模言語モデルのトレーニング後の重み量子化手法である。ニューラルネットワークの重みの分布構造を利用して、ほぼロスレス圧縮を実現する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present PolarQuant, a post-training weight quantization method for large language models (LLMs) that exploits the distributional structure of neural network weights to achieve near-lossless compression. PolarQuant operates in three stages: (1) block-wise normalization to the unit hypersphere, (2) Walsh-Hadamard rotation to transform coordinates into approximately Gaussian random variables, and (3) quantization with centroids matched to the Gaussian distribution. Our ablation reveals that Hadamard rotation alone accounts for 98% of the quality improvement, reducing Qwen3.5-9B perplexity from 6.90 (absmax Q5) to 6.40 (Delta = +0.03 from FP16), making it practically lossless without any calibration data. Furthermore, PolarQuant functions as an effective preprocessing step for downstream INT4 quantizers: PolarQuant Q5 dequantized and re-quantized by torchao INT4 achieves perplexity 6.56 versus 6.68 for direct absmax INT4, while maintaining 43.1 tok/s throughput at 6.5 GB VRAM. Code and models are publicly available.
Abstract（参考訳）: 本稿では,ニューラルネットワーク重みの分布構造を利用した大規模言語モデル(LLM)の学習後重み量子化手法であるPolarQuantについて述べる。 PolarQuant は、(1) 単位超球面へのブロックワイド正規化、(2) ウォルシュ・アダマール回転により座標をほぼガウス確率変数に変換すること、(3) ガウス分布に一致するセントロイドの量子化である。我々のアブレーションは、アダマール回転だけで品質改善の98%を占めており、Qwen3.5-9Bのパープレキシティを6.90 (absmax Q5) から6.40 (Delta = +0.03 from FP16) に減少させ、キャリブレーションデータなしでは事実上損失を生じないことを示した。さらに、PolarQuantは、下流のINT4量子化器の効果的な前処理ステップとして機能する: PolarQuant Q5は、Torchao INT4によって量子化および再量子化され、直接absmax INT4に対して6.56対6.68のパープレキシティを実現し、43.1 tok/sスループットを6.5 GB VRAMで維持する。コードとモデルは公開されている。

論文の概要: PolarQuant: Optimal Gaussian Weight Quantization via Hadamard Rotation for LLM Compression

関連論文リスト