Fugu-MT 論文翻訳(概要): Coverage-Based Calibration for Post-Training Quantization via Weighted Set Cover over Outlier Channels

論文の概要: Coverage-Based Calibration for Post-Training Quantization via Weighted Set Cover over Outlier Channels

arxiv url: http://arxiv.org/abs/2604.24008v1
Date: Mon, 27 Apr 2026 03:43:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.725358
Title: Coverage-Based Calibration for Post-Training Quantization via Weighted Set Cover over Outlier Channels
Title（参考訳）: 外周流路上の重み付き集合被覆による後トレーニング量子化のための被覆型校正法
Authors: Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma,
Abstract要約: PTQ(Post-Training Quantization)は、小さなキャリブレーションセットを用いて、大きな言語モデルを低ビット幅に圧縮する。キャリブレーションサンプルが異常に大きなアクティベーションを持つ隠蔽次元である外れチャネルの活性化に失敗し、量子化器のダイナミックレンジを過小評価する故障モードを同定する。この観察により、PTQの校正品質は、一般的なサンプル代表性よりも、重み付けされた外部チャネルのカバレッジによって管理されていると論じる。
参考スコア（独自算出の注目度）: 6.908972852063454
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Post-Training Quantization (PTQ) compresses large language models to low bit-widths using a small calibration set, and its quality depends strongly on which samples are chosen. We identify a failure mode in which calibration samples fail to activate outlier channels, hidden dimensions with unusually large activations, causing the quantizer to underestimate their dynamic range and producing per-channel reconstruction errors that dominate layer-wise loss. Motivated by this observation, we argue that PTQ calibration quality is governed more by weighted outlier-channel coverage than by generic sample representativeness, and formulate calibration selection as a weighted set cover problem over outlier channels. The objective is monotone submodular, and the greedy algorithm, COVERCAL, operates on pre-computed activation statistics and requires no GPU time at selection. We further show that the weight choice is internally consistent: under a stylized clipping model, missed weighted coverage upper-bounds surrogate loss, justifying the weighted coverage objective as principled rather than purely empirical. Across LLaMA-2, LLaMA-3, and Mistral, under AWQ and GPTQ backends and five downstream evaluations, COVERCAL improves over random, max-perplexity, max-activation-variance, and stratified baselines, with the largest gains at small calibration budgets. At INT4 with 128 samples, COVERCAL improves MMLU by 1.2 to 1.5 points over random calibration and reduces perplexity degradation by 15 to 30\%; with 64 samples, it matches or exceeds random calibration at 256. The contribution is not a new PTQ backend but a formulation of calibration selection as weighted outlier coverage, with a simple, efficient algorithm and a surrogate-based justification.
Abstract（参考訳）: ポストトライニング量子化(PTQ)は、小さなキャリブレーションセットを使用して大きな言語モデルを低ビット幅に圧縮し、その品質はどのサンプルを選択するかに強く依存する。我々は、キャリブレーションサンプルが異常チャネルの活性化に失敗し、異常に大きなアクティベーションを持つ隠蔽次元を識別し、量子化器はそのダイナミックレンジを過小評価し、チャネルごとの再構成エラーを発生させ、層単位での損失を抑える。この結果から, PTQキャリブレーションの品質は, 一般的なサンプル代表性よりも重み付けされたアウトリアチャネルのカバレッジによって制御され, 重み付けされたセットカバー問題としてのキャリブレーション選択が定式化されていることを議論した。目的は単調な部分モジュラーであり、greedyアルゴリズムであるCOVERCALは事前計算されたアクティベーション統計に基づいており、選択時にGPU時間を必要としない。さらに、重み選択は内部的に一貫したものであり、スタイリングされたクリッピングモデルの下では、重み付きカバレッジが欠落した上界のサロゲート損失は、純粋に経験的なものではなく、原則として重み付きカバレッジの目的を正当化する。 AWQとGPTQのバックエンドと5つの下流評価の下で、LLaMA-2、LLaMA-3、Mistralでは、COVERCALはランダム、最大パープレクシリティ、最大アクティベーション分散、成層化ベースラインよりも改善され、小さなキャリブレーション予算で最大のゲインを得た。 128のサンプルを持つINT4では、COVERCALはランダムキャリブレーションでMMLUを1.2から1.5ポイント改善し、パープレキシティ劣化を15から30\%削減する。このコントリビューションは、新しいPTQバックエンドではなく、単純で効率的なアルゴリズムとサロゲートベースの正当化を備えた、重み付けされた外れ値カバレッジとしてのキャリブレーション選択の定式化である。

論文の概要: Coverage-Based Calibration for Post-Training Quantization via Weighted Set Cover over Outlier Channels

関連論文リスト