Fugu-MT 論文翻訳(概要): Initialisation Determines the Basin: Efficient Codebook Optimisation for Extreme LLM Quantization

論文の概要: Initialisation Determines the Basin: Efficient Codebook Optimisation for Extreme LLM Quantization

arxiv url: http://arxiv.org/abs/2604.08118v1
Date: Thu, 09 Apr 2026 11:38:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:05.889119
Title: Initialisation Determines the Basin: Efficient Codebook Optimisation for Extreme LLM Quantization
Title（参考訳）: 盆地の初期化決定:超LLM量子化のための効率的なコードブック最適化
Authors: Ian W. Kennedy, Nafise Sadat Moosavi,
Abstract要約: 私たちは、コードブックの初期化が主なボトルネックであることを示しています。ヘッセン重み付きマハラノビス距離を用いた出力対応EM初期化法であるOA-EMを提案する。
参考スコア（独自算出の注目度）: 11.255860546984069
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Additive quantization enables extreme LLM compression with O(1) lookup-table dequantization, making it attractive for edge deployment. Yet at 2-bit precision, it often fails catastrophically, even with extensive search and finetuning. We show that the dominant bottleneck is codebook initialisation. Greedy sequential initialisation frequently places the model in poor optimisation regions that subsequent beam search and PV-tuning struggle to overcome. We analyse this behaviour through the representational ratio \r{ho} = N/KM, which characterises the relationship between weight groups and codebook capacity, and propose OA-EM, an output-aware EM initialisation method using Hessian-weighted Mahalanobis distance. Across compression rates, search budgets, and three architectures (Llama 3.2 3B, Llama 3.1 8B, Qwen 2.5 3B), OA-EM consistently produces better solutions after PV-tuning and dominates the quality-compute frontier. The severity of the bottleneck scales with \r{ho}: moderate at 3 bpp but extreme at 2 bpp, where poor initialisation can degrade perplexity by orders of magnitude. More broadly, our results highlight the importance of optimisation geometry in compressed model spaces, where initialisation can dominate subsequent search and fine-tuning.
Abstract（参考訳）: 付加量子化により、O(1)ルックアップテーブルデクエント化による極端LLM圧縮が可能となり、エッジ展開に魅力的なものとなる。しかし、2ビットの精度では、大規模な検索や微調整でも破滅的に失敗することが多い。私たちは、コードブックの初期化が主なボトルネックであることを示しています。グレディシーケンシャル初期化(英語版)はしばしば、ビーム探索とPV調整の困難が克服されるような最適化の不十分な領域にモデルを配置する。我々は、重み群とコードブック容量の関係を特徴付ける表現比 \r{ho} = N/KM を用いて、この振る舞いを分析し、ヘッセン重み付きマハラノビス距離を用いた出力対応EM初期化法であるOA-EMを提案する。圧縮速度、探索予算、および3つのアーキテクチャ(Llama 3.2 3B、Llama 3.1 8B、Qwen 2.5 3B)にわたって、OA-EMはPV調整後に常により良いソリューションを生成し、品質計算フロンティアを支配している。ボトルネックの重大さは、r{ho} でスケールする: 3 bpp では中等度だが 2 bpp では極度にスケールする。より広範に、圧縮されたモデル空間における最適化幾何学の重要性を強調し、初期化がその後の探索と微調整を支配できることを示した。

論文の概要: Initialisation Determines the Basin: Efficient Codebook Optimisation for Extreme LLM Quantization

関連論文リスト