Fugu-MT 論文翻訳(概要): Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs

論文の概要: Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs

arxiv url: http://arxiv.org/abs/2606.12280v2
Date: Fri, 12 Jun 2026 15:58:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-15 13:53:03.584842
Title: Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs
Title（参考訳）: 8ビットでのFP8品質シーリングとアクティベーション: INT8とGGUFによる消費者向けGPU用Ideogram 4.0のポストトレーニング量子化
Authors: Deep Gandhi, Ali Asaria, Tony Salomone,
Abstract要約: 9.3Bフローベース拡散変圧器(DiT)Ideogram 4.0の学習後量子化について検討する。我々は、Ideogramの公開キャプション仕様に組み込んだ拡張器によって生成されたスキーマ値プロンプトに基づいて、すべての変種を評価する。 We release the INT8 W8A8 and GG Q4_K Quantized weights on Hugging under a gated, non-commercial license。
参考スコア（独自算出の注目度）: 0.08599681538174887
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We study post-training quantization (PTQ) of Ideogram 4.0, a 9.3B flow-matching diffusion transformer (DiT) that realizes classifier-free guidance with two separate-weight copies of a single-stream backbone and is conditioned by a Qwen3-VL text encoder, targeting Ampere RTX~3090 GPUs, which lack FP8 tensor cores. Because Ideogram~4.0 is trained on structured JSON captions, we evaluate every variant under schema-valid JSON prompts produced by an LLM expander built to Ideogram's published caption specification, and score them with a battery spanning human-preference (HPSv2), CLIP, and PickScore for standalone quality; PP-OCR exact-match and edit distance for text; and PSNR/SSIM/LPIPS for fidelity to the FP8 reference (the highest-precision public checkpoint) output. On a 300-prompt benchmark with paired bootstrap confidence intervals, an INT8 W8A8 recipe (per-channel weights, per-token dynamic activations, SmoothQuant, and bf16 protection of a small high-fragility layer set) is statistically indistinguishable from FP8 on CLIP and PickScore (paired CIs include zero) and within ~0.004 HPSv2, and, at its 8-bit size, is the most faithful reproduction of the FP8 output (LPIPS 0.243 vs 0.277/0.306 for the half-size 4-bit baselines; the INT8-Q4_K gap excludes zero). A GGUF Q4_K quantization reaches the same standalone quality as the published NF4 baseline at the same on-disk size, making it the Pareto choice on the quality-memory frontier. We further show that under JSON prompts all four variants reach parity on standalone quality, the variants separate on fidelity and text rendering, not on aggregate image-quality scores, and that text legibility, near-zero when the model is prompted with raw strings, reaches 55% OCR exact-match under the JSON captions it expects. We release the INT8 W8A8 and GGUF Q4_K quantized weights on Hugging Face under a gated, non-commercial license.
Abstract（参考訳）: 本研究では,FP8テンソルコアを欠いたAmpere RTX~3090 GPUをターゲットとしたQwen3-VLテキストエンコーダを用いて,シングルストリームバックボーンの2つの分離重み付きコピーによる分類器レスガイダンスを実現する9.3Bフローマッチング拡散トランスフォーマ(DiT)であるIdeogram 4.0のポストトレーニング量子化(PTQ)について検討する。 Ideogram~4.0は構造化JSONキャプションに基づいてトレーニングされているため、Ideogramの公開キャプション仕様に組み込まれたLCM拡張器によって生成されたスキーマ値JSONプロンプトに基づいて、すべての変種を評価し、スタンドアロン品質のためにHPSv2、CLIP、PickScoreにまたがるバッテリーでスコアし、PP-OCRの正確なマッチとテキストの編集距離、FP8参照(最高精度の公開チェックポイント)への忠実さをPSNR/SSIM/LPIPSで評価する。ブートストラップの信頼区間がペア化された300プロンプトのベンチマークでは、INT8 W8A8レシピ(チャネルあたりの重み、トーケン毎の動的アクティベーション、SmoothQuant、bf16の保護)はCLIPとPickScoreのFP8と統計的に区別できず、約0.004 HPSv2の範囲内であり、8ビットサイズではFP8出力の最も忠実な再現である(LPIPS 0.243 vs 0.277/0.306、INT8-Q4_Kギャップはゼロである)。 GGUF Q4_K量子化は、発行されたNF4ベースラインと同じオンディスクサイズでスタンドアロン品質に達し、品質メモリフロンティアにおけるPareto選択となる。さらに、JSONの下では、すべての4つの変種がスタンドアロンの品質で同等に達するように促され、その変種は、画像品質スコアの集計ではなく、忠実さとテキストレンダリングで分離され、また、モデルが生文字列でトリガーされた場合、テキストの正当性がほぼゼロであり、JSONキャプションが期待する55%のOCRの正確なマッチに達することを示しています。 We release the INT8 W8A8 and GGUF Q4_K Quantized weights on Hugging Face under a gated, non-commercial license。

論文の概要: Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs

関連論文リスト