Fugu-MT 論文翻訳(概要): ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers

論文の概要: ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers

arxiv url: http://arxiv.org/abs/2512.03673v1
Date: Wed, 03 Dec 2025 11:02:16 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-04 20:02:55.258044
Title: ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers
Title（参考訳）: ConvRot: 拡散変換器のための回転型プラグアンドプレイ4ビット量子化
Authors: Feice Huang, Zuliang Han, Xing Zhou, Yihuang Chen, Lifei Zhu, Haoqian Wang,
Abstract要約: モデルのサイズが大きくなるにつれて、メモリフットプリントの増加と推論のレイテンシは、実用的なデプロイメントにおいて大きな課題となる。大規模言語モデル(LLMs)における最近の研究は、回転に基づく手法が外周を滑らかにし、4ビットの量子化を可能にすることを示している。本稿では,正則アダマール変換(RHT)を利用したグループワイド回転に基づく量子化手法であるConvRotを提案する。
参考スコア（独自算出の注目度）: 21.65616995056907
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion transformers have demonstrated strong capabilities in generating high-quality images. However, as model size increases, the growing memory footprint and inference latency pose significant challenges for practical deployment. Recent studies in large language models (LLMs) show that rotation-based techniques can smooth outliers and enable 4-bit quantization, but these approaches often incur substantial overhead and struggle with row-wise outliers in diffusion transformers. To address these challenges, we propose ConvRot, a group-wise rotation-based quantization method that leverages regular Hadamard transform (RHT) to suppress both row-wise and column-wise outliers while reducing complexity from quadratic to linear. Building on this, we design ConvLinear4bit, a plug-and-play module that integrates rotation, quantization, GEMM, and dequantization, enabling W4A4 inference without retraining and preserving visual quality. Experiments on FLUX.1-dev demonstrate a 2.26$\times$ speedup and 4.05$\times$ memory reduction while maintaining image fidelity. To our knowledge, this is the first application of rotation-based quantization for plug-and-play W4A4 inference in diffusion transformers.
Abstract（参考訳）: 拡散変換器は高品質な画像を生成する強力な能力を示している。しかし、モデルのサイズが大きくなるにつれて、メモリフットプリントの増加と推論のレイテンシは、実用的なデプロイメントにおいて大きな課題となる。大規模言語モデル(LLMs)における最近の研究は、回転に基づく手法が外周を滑らかにし、4ビットの量子化を可能にしていることを示しているが、これらの手法はしばしばかなりのオーバーヘッドを発生させ、拡散変圧器の行方向外周と戦っている。これらの課題に対処するために,正則アダマール変換(RHT)を利用したグループワイド回転に基づく量子化手法であるConvRotを提案する。そこで我々は, 回転, 量子化, GEMM, 量子化を統合し, W4A4推論を可能にするプラグイン・アンド・プレイモジュールであるConvLinear4bitを設計した。 FLUX.1-devの実験では、画像の忠実さを維持しながら2.26$\times$スピードアップと4.05$\times$メモリ削減が示されている。我々の知る限り、これは拡散変圧器におけるプラグアンドプレイW4A4推論に対する回転型量子化の最初の応用である。

論文の概要: ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers

関連論文リスト