Fugu-MT 論文翻訳(概要): Q-ARVD: Quantizing Autoregressive Video Diffusion Models

論文の概要: Q-ARVD: Quantizing Autoregressive Video Diffusion Models

arxiv url: http://arxiv.org/abs/2605.21072v1
Date: Wed, 20 May 2026 11:58:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.654871
Title: Q-ARVD: Quantizing Autoregressive Video Diffusion Models
Title（参考訳）: Q-ARVD:自己回帰型ビデオ拡散モデルの定量化
Authors: Siao Tang, Xinyin Ma, Gongfan Fang, Xingyi Yang, Xinchao Wang,
Abstract要約: 自動回帰ビデオ拡散モデル(ARVD)は、ストリーミングビデオ生成のための有望なアーキテクチャとして登場した。しかし、AVVDのかなりの推論コストは、実際的な展開の大きな障害であり続けている。正確なAVVD量子化のための新しいフレームワークであるQ-ARVDを提案する。
参考スコア（独自算出の注目度）: 98.30793646153926
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autoregressive video diffusion models (ARVDs) have emerged as a promising architecture for streaming video generation, paving the way for real-time interactive video generation and world modeling. Despite their potential, the substantial inference cost of ARVDs remains a major obstacle to practical deployment, making model quantization a natural direction for improving efficiency. However, quantization for ARVDs remains largely unexplored. Our empirical analysis shows that directly applying existing quantization schemes developed for standard diffusion transformers to ARVDs leads to suboptimal performance, revealing quantization behaviors that differ from those observed in bidirectional diffusion models. In this paper, we identify two critical challenges in quantizing ARVDs: (C1) Highly unbalanced frame-wise quantization sensitivity. Error accumulation during autoregressive generation can induce severely skewed quantization sensitivity across frames, following an exponential-like decay pattern. (C2) Prominent and heterogeneous outlier patterns in weights. Weight distributions exhibit pronounced outlier channels, whose patterns vary substantially across layer types and block depths. To address these issues, we propose Q-ARVD, a novel framework for accurate ARVD quantization. (S1) To tackle the highly unbalanced frame-wise sensitivity, Q-ARVD incorporates a final-quality aware frame-weighting mechanism into the quantization objective. (S2) To prevent heterogeneous outliers from degrading performance, Q-ARVD introduces an outlier-aware adaptive dual-scale quantization, which automatically detects the presence and quantity of outlier channels for an arbitrary layer, and isolates them to protect normal channels. Extensive experiments demonstrate the superiority of Q-ARVD.
Abstract（参考訳）: 自動回帰ビデオ拡散モデル(ARVD)は、リアルタイムインタラクティブビデオ生成と世界モデリングの道を開いたストリーミングビデオ生成のための有望なアーキテクチャとして登場した。これらの可能性にもかかわらず、AVVDのかなりの推論コストは、実用的展開の大きな障害であり、モデルの量子化は効率を改善するための自然な方向である。しかし、AVVDの量子化はほとんど未解明のままである。実験により,標準拡散変換器で開発された量子化スキームをAVVDに直接適用することで,双方向拡散モデルと異なる量子化挙動を示す。本稿では,ARVDの量子化における2つの重要な課題について述べる。自己回帰生成中の誤差蓄積は指数関数的な崩壊パターンに従い、フレーム間で大きな歪んだ量子化感度を誘導する。 (C2) 重みの顕著で異質な外乱パターン。重量分布は明らかに外層チャネルを示し、そのパターンは層の種類やブロック深さによって大きく異なる。これらの問題に対処するため、我々は正確なAVVD量子化のための新しいフレームワークであるQ-ARVDを提案する。 (S1) Q-ARVDは、フレーム単位の高バランスな感度に対処するため、量子化の目的に最終品質のフレーム重み付け機構を組み込む。 (S2) 不均一な外れ値の劣化を防止するため、Q-ARVDは、任意の層に対する外れ値チャネルの存在と量を自動的に検出し、正常なチャネルを保護するために分離する、外れ値対応の適応型デュアルスケール量子化を導入する。大規模な実験はQ-ARVDの優位性を示す。

論文の概要: Q-ARVD: Quantizing Autoregressive Video Diffusion Models

関連論文リスト