Fugu-MT 論文翻訳(概要): ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms

論文の概要: ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms

arxiv url: http://arxiv.org/abs/2509.09679v1
Date: Thu, 11 Sep 2025 17:59:51 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-12 16:52:24.51643
Title: ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms
Title（参考訳）: ButterflyQuant:学習可能な直交バタフライ変換による超低ビットLDM量子化
Authors: Bingxin Xu, Zhen Dong, Oussama Elachqar, Yuzhang Shang,
Abstract要約: 量子化は低い数値精度でメモリを減少させるが、極端な2ビット量子化は、アクティベーションの異常値による破滅的な性能損失に悩まされる。本研究では,アダマール回転を学習可能なバタフライ変換に置き換えるバタフライ量子化法を提案する。 2ビット量子化を持つLLaMA-2-7Bでは、ButterflyQuant は QuaRot では 22.1 に対して 15.4 のパープレキシティを達成している。
参考スコア（独自算出の注目度）: 21.010238822100135
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models require massive memory footprints, severely limiting deployment on consumer hardware. Quantization reduces memory through lower numerical precision, but extreme 2-bit quantization suffers from catastrophic performance loss due to outliers in activations. Rotation-based methods such as QuIP and QuaRot apply orthogonal transforms to eliminate outliers before quantization, using computational invariance: $\mathbf{y} = \mathbf{Wx} = (\mathbf{WQ}^T)(\mathbf{Qx})$ for orthogonal $\mathbf{Q}$. However, these methods use fixed transforms--Hadamard matrices achieving optimal worst-case coherence $\mu = 1/\sqrt{n}$--that cannot adapt to specific weight distributions. We identify that different transformer layers exhibit distinct outlier patterns, motivating layer-adaptive rotations rather than one-size-fits-all approaches. We propose ButterflyQuant, which replaces Hadamard rotations with learnable butterfly transforms parameterized by continuous Givens rotation angles. Unlike Hadamard's discrete $\{+1, -1\}$ entries that are non-differentiable and prohibit gradient-based learning, butterfly transforms' continuous parameterization enables smooth optimization while guaranteeing orthogonality by construction. This orthogonal constraint ensures theoretical guarantees in outlier suppression while achieving $O(n \log n)$ computational complexity with only $\frac{n \log n}{2}$ learnable parameters. We further introduce a uniformity regularization on post-transformation activations to promote smoother distributions amenable to quantization. Learning requires only 128 calibration samples and converges in minutes on a single GPU--a negligible one-time cost. On LLaMA-2-7B with 2-bit quantization, ButterflyQuant achieves 15.4 perplexity versus 22.1 for QuaRot.
Abstract（参考訳）: 大きな言語モデルは巨大なメモリフットプリントを必要とし、コンシューマハードウェアへのデプロイを著しく制限する。量子化は低い数値精度でメモリを減少させるが、極端な2ビット量子化は、アクティベーションの異常値による破滅的な性能損失に悩まされる。 QuIP や QuaRot のような回転型法は、量子化前の外れ値を排除するために直交変換を適用するが、計算不変性は$\mathbf{y} = \mathbf{Wx} = (\mathbf{WQ}^T)(\mathbf{Qx})$ for orthogonal $\mathbf{Q}$である。しかし、これらの方法は固定変換を用いる-アダマール行列は最適な最悪のケースコヒーレンス$\mu = 1/\sqrt{n}$-を達成し、比重分布に適応できない。異なる変圧器層は異なる外周パターンを示し, 一つの大きさに適合するアプローチではなく, 層適応回転を動機付けている。本研究では,アダマール回転を学習可能なバタフライ変換に置き換えるバタフライ量子化法を提案する。アダマールの離散$\{+1, -1\}$エントリが微分不可能で勾配に基づく学習を禁止しているのとは異なり、バタフライ変換の連続パラメタライゼーションは、構成による直交性を確保しながら滑らかな最適化を可能にする。この直交制約は、$O(n \log n)$の計算複雑性を$\frac{n \log n}{2}$の学習可能なパラメータで達成しながら、外れ値抑制の理論的保証を保証する。さらに、量子化可能なスムーズな分布を促進するために、変換後の活性化に関する一様正則化を導入する。学習には128のキャリブレーションサンプルが必要で、単一のGPUで数分で収束する。 2ビット量子化を持つLLaMA-2-7Bでは、ButterflyQuant は QuaRot では 22.1 に対して 15.4 のパープレキシティを達成している。

論文の概要: ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms

関連論文リスト