Fugu-MT 論文翻訳(概要): Grid Games: The Power of Multiple Grids for Quantizing Large Language Models

論文の概要: Grid Games: The Power of Multiple Grids for Quantizing Large Language Models

arxiv url: http://arxiv.org/abs/2605.12327v1
Date: Tue, 12 May 2026 16:09:02 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 21:48:56.997792
Title: Grid Games: The Power of Multiple Grids for Quantizing Large Language Models
Title（参考訳）: グリッドゲーム: 大規模言語モデルの量子化のための複数のグリッドのパワー
Authors: Vage Egiazarian, Erik Schultheis, Andrei Panferov, Earl Killian, Torsten Hoefler, Dan Alistarh,
Abstract要約: 量子化の最近の大きな進歩は、NVFP4 や MXFP4 のようなマイクロスケールの4ビットフォーマットによって与えられ、値をスケールを共有する小さなグループに量子化する。パワー・オブ・ツー・グライド(PO2)問題を定式化し、MXFPやNVFPのような実用的な小群フォーマットがPO2グリッドの恩恵を受けることを示す理論的結果を提供する。 Llama-like モデルの標準開模型のポストトレーニング量子化と事前学習の結果は、重みのみおよび重み+アクティベーションの両方の下で、適応格子が単一グリッド FP4 に対して常に精度を向上することを示している。
参考スコア（独自算出の注目度）: 50.885349461958384
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A major recent advance in quantization is given by microscaled 4-bit formats such as NVFP4 and MXFP4, quantizing values into small groups sharing a scale, assuming a fixed floating-point grid. In this paper, we study the following natural extension: assume that, for each group of values, we are free to select the "better" among two or more 4-bit grids marked by one or more bits in the scale value. We formalize the power-of-two-grids (PO2) problem, and provide theoretical results showing that practical small-group formats such as MXFP or NVFP can benefit significantly from PO2 grids, while the advantage vanishes for very large groups. On the practical side, we instantiate several grid families, including 1) PO2(NF4), which pairs the standard NF4 normal grid with a learned grid, 2) MPO2, a grid pair that is fully learned over real weights and activations, 3) PO2(Split87), an explicit-zero asymmetric grid and 4) SFP4, a TensorCore-implementable triple which pairs NVFP4 with two shifted variants. Results for post-training quantization of standard open models and pre-training of Llama-like models show that adaptive grids consistently improve accuracy vs single-grid FP4 under both weight-only and weight+activation. Source code is available at https://github.com/IST-DASLab/GridGames.
Abstract（参考訳）: 量子化の最近の大きな進歩は、NVFP4やMXFP4のようなマイクロスケールの4ビットフォーマットによって与えられ、固定浮動小数点格子を仮定して、値をスケールを共有する小さなグループに量子化する。本稿では,各値群に対して,2つ以上の4ビットグリッドのうち1つ以上のビットが特徴付けられる「ベタ」を自由に選択できることを仮定する。パワー・オブ・ツー・グライド(PO2)問題を定式化し、MXFPやNVFPのような実践的な小群形式がPO2グリッドの恩恵を受けられることを示す理論結果を提供する。実用面では、いくつかのグリッドファミリーをインスタンス化する。 1PO2(NF4)は、標準のNF4正規グリッドと学習グリッドをペアリングする。 2)MPO2は、実際の重みとアクティベーションについて完全に学習されるグリッドペアである。 3)PO2(Split87)、明示的ゼロ非対称格子、および 4) SFP4はTensorCoreで実装可能なトリプルで、NVFP4と2つのシフト変種をペアリングする。 Llama-like モデルの標準開模型のポストトレーニング量子化と事前学習の結果は、重みのみおよび重み+アクティベーションの両方の下で、適応格子が単一グリッド FP4 に対して常に精度を向上することを示している。ソースコードはhttps://github.com/IST-DASLab/GridGames.comで入手できる。

論文の概要: Grid Games: The Power of Multiple Grids for Quantizing Large Language Models

関連論文リスト