Fugu-MT 論文翻訳(概要): SDQ-LLM: Sigma-Delta Quantization for 1-bit LLMs of any size

論文の概要: SDQ-LLM: Sigma-Delta Quantization for 1-bit LLMs of any size

arxiv url: http://arxiv.org/abs/2510.03275v1
Date: Sat, 27 Sep 2025 14:49:58 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:58.635251
Title: SDQ-LLM: Sigma-Delta Quantization for 1-bit LLMs of any size
Title（参考訳）: SDQ-LLM:任意のサイズの1ビットLDMのシグマデルタ量子化
Authors: Junhao Xia, Ming Zhao, Limin Xiao, Xiujun Zhang,
Abstract要約: 大規模言語モデル(LLM)は、計算とメモリの問題に直面する。 SDQ-LLM: Sigma-Delta Quantization for 1-bit LLMs of any size。 SDQ-LLMの特徴は、Over-Sampling Ratio (OSR) の連続層である。
参考スコア（独自算出の注目度）: 5.229694155440675
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) face significant computational and memory challenges, making extremely low-bit quantization crucial for their efficient deployment. In this work, we introduce SDQ-LLM: Sigma-Delta Quantization for 1-bit LLMs of any size, a novel framework that enables extremely low-bit quantization of LLMs while preserving their linguistic reasoning capabilities. A distinctive feature of SDQ-LLM is the continuous adjustability of the Over-Sampling Ratio (OSR), enabling dynamic adaptation to memory or VRAM constraints by selecting fractional OSR (e.g. 2.5 times) for an optimal trade-off between model size and accuracy. SDQ-LLM uses upsampling combined with Sigma-Delta Quantizer to binarize or ternarize LLMs weights, encoding high-precision parameters into 1-bit or 1.58-bit representations, replacing the multiplication operations within linear layers with addition. This approach significantly enhances inference efficiency under extremely low-bit quantization. To further reduce the loss of quantization precision, we incorporate Hadamard-based weight smoothing prior to quantization, improving the stability and robustness of the weight representations. Furthermore, to fully leverage the continuity of the OSR and reduce precision loss, recognizing the correlation between quantization sensitivity and weight variance, we propose a fine-grained, layer- and linear-wise OSR allocation strategy, MultiOSR. This strategy distributes OSR both across layers and within each layer, based on weight variance and parameter scale. Finally, extensive experiments on OPT and LLaMA model families demonstrate that SDQ-LLM achieves a more efficient and high-precision performance even under highly aggressive low-OSR settings. Our code is available at https://github.com/Dreamlittlecat/LLM-Quant-Factory.
Abstract（参考訳）: 大規模言語モデル(LLM)は計算とメモリの面で大きな課題に直面しており、その効率的な展開には極端に低ビットの量子化が不可欠である。本研究はSDQ-LLM: Sigma-Delta Quantization for 1bit LLMs of any size, is a novel framework that allowing extremely low-bit Quantization of LLMs withoutserving capabilities。 SDQ-LLMの特筆すべき特徴は、Over-Sampling Ratio (OSR) の連続的な調整性であり、モデルサイズと精度の最適なトレードオフのために、分数OSR(例えば2.5倍)を選択することで、メモリやVRAMの制約への動的適応を可能にする。 SDQ-LLMはSigma-Delta Quantizerと組み合わせてLLMの重みを二項化または三項化し、高精度なパラメータを1ビットまたは1.58ビットの表現に符号化し、線形層内の乗算演算を加算する。このアプローチは、極低ビット量子化下での推論効率を大幅に向上させる。量子化精度の低下をさらに軽減するために、量子化に先立ってアダマールをベースとした重みの平滑化を取り入れ、重み表現の安定性と堅牢性を向上させる。さらに,OSRの連続性をフル活用し,量子化感度と重み分散の相関性を認識して精度損失を低減するため,MultiOSRの微細化,層状化,線形化を行うOSRアロケーション戦略を提案する。この戦略は、重み分散とパラメータスケールに基づいて、OSRを層間および各層内の両方に分散する。最後に、OPTおよびLLaMAモデルファミリーに関する広範な実験により、SDQ-LLMは高能率な低OSR設定下であっても、より効率的で高精度な性能が得られることを示した。私たちのコードはhttps://github.com/Dreamlittlecat/LLM-Quant-Factoryで利用可能です。

論文の概要: SDQ-LLM: Sigma-Delta Quantization for 1-bit LLMs of any size

関連論文リスト