Fugu-MT 論文翻訳(概要): DiRotQ: Rotation-Aware Quantization for 4-bit Diffusion Transformers

論文の概要: DiRotQ: Rotation-Aware Quantization for 4-bit Diffusion Transformers

arxiv url: http://arxiv.org/abs/2605.16732v1
Date: Sat, 16 May 2026 00:52:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 23:51:08.334608
Title: DiRotQ: Rotation-Aware Quantization for 4-bit Diffusion Transformers
Title（参考訳）: DiRotQ: 4ビット拡散変換器の回転認識量子化
Authors: Sayeh Sharify, Mahsa Salmani, Hesham Mostafa,
Abstract要約: Diffusion Transformer (DiTs) は最先端の画像生成品質を実現するが、推論時にかなりのメモリと計算コストを発生させる。平滑化法、混合精度法、回転法、低ランク残差法などの既存の手法は、この問題を部分的に緩和するが、それでもFP16/BF16の性能に顕著なギャップを残している。本稿では、回転認識型アクティベーション量子化による劣化を緩和するW4A4 PTQフレームワークであるDiRotQを紹介する。
参考スコア（独自算出の注目度）: 3.0583214514538084
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Diffusion Transformers (DiTs) achieve state-of-the-art image generation quality but incur substantial memory and computational costs at inference. While aggressive Post-Training Quantization (PTQ) to 4-bit precision offers significant efficiency gains, it typically results in severe quality degradation. Existing approaches, including smoothing-based methods, mixed-precision schemes, rotation techniques, and low-rank residual methods, partially mitigate this issue but still leave a noticeable gap to FP16/BF16 performance. In this work, we introduce DiRotQ, a W4A4 PTQ framework that mitigates this degradation through rotation-aware activation quantization. DiRotQ identifies a low-rank subspace capturing dominant activation variance via Principal Component Analysis (PCA), preserving coefficients in this subspace at higher precision while quantizing the remaining components to 4-bit. Activations are rotated into the PCA basis at inference time using calibration-derived orthogonal transformations, while the inverse rotation is fused into the layer weights offline. Combined with GPTQ-based weight quantization, DiRotQ achieves an FID (lower is better) of 15.9 and PSNR (higher is better) of 19.1 dB on PixArt-Σ over the MJHQ-30K dataset, outperforming the prior state-of-the-art SVDQuant (FID 18.9, PSNR 17.6) under the same INT W4A4 setting. Beyond standard metrics, we introduce a VLM-as-a-Judge evaluation protocol for diffusion model quantization, the first such evaluation in this setting, providing a more holistic assessment of perceptual quality and prompt alignment under aggressive compression. On the systems side, we implement a Triton-based custom kernel to enable efficient end-to-end inference, reducing memory usage of the 12B FLUX.1-dev model by 2.1x and delivering 2.3x speedup over the BF16 baseline, on a 24 GB RTX 4090 GPU.
Abstract（参考訳）: Diffusion Transformer (DiTs) は最先端の画像生成品質を実現するが、推論時にかなりのメモリと計算コストを発生させる。攻撃的なポストトレーニング量子化(PTQ)から4ビットの精度は、大きな効率向上をもたらすが、通常は深刻な品質劣化をもたらす。平滑化法、混合精度法、回転法、低ランク残差法などの既存の手法は、この問題を部分的に緩和するが、それでもFP16/BF16の性能に顕著なギャップを残している。本稿では,この劣化を緩和するW4A4 PTQフレームワークであるDiRotQを紹介する。 DiRotQは、主成分分析(PCA)によって支配的な活性化分散を捉え、残りの成分を4ビットに量子化しながら、この部分空間の係数を高い精度で保存する。活性化はキャリブレーション由来の直交変換を用いて推論時間にPCAベースに回転し、逆回転は層重みに融合する。 GPTQベースの重み量子化と組み合わせて、DiRotQは、MJHQ-30Kデータセット上でPixArt-Σ上の19.1dBのFID(より低くなる)とPSNR(高くなる)を達成し、同じINT W4A4設定の下で従来のSVDQuant(FID 18.9, PSNR 17.6)よりも優れている。本稿では,拡散モデル量子化のためのVLM-as-a-Judge評価プロトコルを導入する。システム側では、Tritonベースのカスタムカーネルを実装し、効率的なエンドツーエンド推論を可能にし、12B FLUX.1-devモデルのメモリ使用量を2.1倍削減し、24GB RTX 4090 GPUでBF16ベースライン上で2.3倍のスピードアップを提供する。

論文の概要: DiRotQ: Rotation-Aware Quantization for 4-bit Diffusion Transformers

関連論文リスト