Fugu-MT 論文翻訳(概要): BitsMoE: Efficient Spectral Energy-Guided Bit Allocation for MoE LLM Quantization

論文の概要: BitsMoE: Efficient Spectral Energy-Guided Bit Allocation for MoE LLM Quantization

arxiv url: http://arxiv.org/abs/2606.00079v1
Date: Fri, 22 May 2026 13:05:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-07 20:42:22.548081
Title: BitsMoE: Efficient Spectral Energy-Guided Bit Allocation for MoE LLM Quantization
Title（参考訳）: BitsMoE: MoE LLM量子化のための効率的なスペクトルエネルギー誘導ビット割り当て
Authors: Jiayu Zhao, Zihan Teng, Minhao Fan, Tianrui Ma, Wentao Ren, Song Chen, Weichen Liu,
Abstract要約: Mixture-of-Experts (MoE) 大規模言語モデルでは、スパース専門家アクティベーションによるトーケン毎の計算が削減される。既存のMoE圧縮法は、超低ビット方式では困難である。我々は、MoE LLM量子化のためのスペクトルエネルギー誘導ビット割り当てフレームワークBitsMoEを提案する。
参考スコア（独自算出の注目度）: 5.878850231726241
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mixture-of-Experts (MoE) large language models reduce per-token computation through sparse expert activation, but their deployment remains memory-intensive because all expert weights must be kept resident in memory. Existing MoE compression methods struggle in the ultra-low-bit regime: pruning irreversibly removes model capacity, while coarse-grained quantization fails to allocate bits according to heterogeneous expert and weight-direction importance. We propose BitsMoE, a spectral-energy-guided bit-allocation framework for MoE LLM quantization. BitsMoE decomposes each MoE layer by SVD into a shared basis and expert-specific spectral factors, retaining the shared basis without quantization to preserve common cross-expert structure and using the expert-specific factors as fine-grained quantization units. To determine the bit-width of each unit, BitsMoE formulates spectrum-wise mixed-precision quantization as an activation-aware reconstruction surrogate and solves an integer linear program that minimizes estimated reconstruction loss under a fixed bit budget. Experiments across multiple MoE LLMs show that BitsMoE substantially reduces downstream task accuracy degradation in ultra-low-bit regimes. Under 2-bit quantization on Qwen3-30B-A3B-Base, BitsMoE accelerates quantization by 12.3$\times$, improves average accuracy by 27.83 percentage points, and increases decoding speed by 1.76$\times$ over GPTQ. Our model and code are publicly available at https://github.com/zjiayu064/BitsMoE.
Abstract（参考訳）: Mixture-of-Experts (MoE) 大規模言語モデルでは、スパース専門家アクティベーションを通じてトーケン毎の計算が削減されるが、すべての専門家の重みをメモリ内に保持しなければならないため、その展開はメモリ集約的である。プルーニングはモデル容量を不可逆的に除去するが、粗い量子化は不均一な専門家と重み付けの重要度に応じてビットを割り当てることに失敗する。我々は、MoE LLM量子化のためのスペクトルエネルギー誘導ビット割り当てフレームワークBitsMoEを提案する。 BitsMoEは、各MoE層をSVDによって共有基底と専門家固有のスペクトル因子に分解し、共有基底を量子化せずに保持し、共通のクロスエキスパート構造を保持し、専門家固有の因子を微粒化量子化単位として使用する。各ユニットのビット幅を決定するために、BitsMoEは、アクティベーション対応再構成サロゲートとしてスペクトルワイド混合精度量子化を定式化し、固定ビット予算の下で推定された再構成損失を最小限に抑える整数線形プログラムを解く。複数のMOE LLMに対する実験により、BitsMoEは超低ビット状態における下流タスクの精度劣化を著しく低減することが示された。 Qwen3-30B-A3B-Base上の2ビット量子化の下で、BitsMoEは量子化を12.3$\times$で加速し、平均精度を27.83ポイント改善し、復号速度を1.76$\times$で向上させる。私たちのモデルとコードはhttps://github.com/zjiayu064/BitsMoE.comで公開されています。

論文の概要: BitsMoE: Efficient Spectral Energy-Guided Bit Allocation for MoE LLM Quantization

関連論文リスト