Fugu-MT 論文翻訳(概要): Drift-AR: Single-Step Visual Autoregressive Generation via Anti-Symmetric Drifting

論文の概要: Drift-AR: Single-Step Visual Autoregressive Generation via Anti-Symmetric Drifting

arxiv url: http://arxiv.org/abs/2603.28049v2
Date: Wed, 08 Apr 2026 10:41:16 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-09 14:06:04.87641
Title: Drift-AR: Single-Step Visual Autoregressive Generation via Anti-Symmetric Drifting
Title（参考訳）: Drift-AR:アンチシメトリドリフトによる単段視覚自己回帰生成
Authors: Zhen Zou, Xiaoxiao Ma, Mingde Yao, Jie Huang, LinJiang Huang, Feng Zhao,
Abstract要約: 自己回帰(AR)-拡散ハイブリッドパラダイムは、ARの構造的セマンティックモデリングと拡散の高忠実性合成を組み合わせるが、二重速度ボトルネックに悩まされる。エントロピー信号を利用して両段階を高速化する textbfDrift-AR を提案する。 MAR、TransDiff、NextStep-1の実験では、3.8-5.5$times$ speedup with original 1-NFE decoding, matching or overing original quality。
参考スコア（独自算出の注目度）: 25.589468409950484
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autoregressive (AR)-Diffusion hybrid paradigms combine AR's structured semantic modeling with diffusion's high-fidelity synthesis, yet suffer from a dual speed bottleneck: the sequential AR stage and the iterative multi-step denoising of the diffusion vision decode stage. Existing methods address each in isolation without a unified principle design. We observe that the per-position \emph{prediction entropy} of continuous-space AR models naturally encodes spatially varying generation uncertainty, which simultaneously governing draft prediction quality in the AR stage and reflecting the corrective effort required by vision decoding stage, which is not fully explored before. Since entropy is inherently tied to both bottlenecks, it serves as a natural unifying signal for joint acceleration. In this work, we propose \textbf{Drift-AR}, which leverages entropy signal to accelerate both stages: 1) for AR acceleration, we introduce Entropy-Informed Speculative Decoding that align draft-target entropy distributions via a causal-normalized entropy loss, resolving the entropy mismatch that causes excessive draft rejection; 2) for visual decoder acceleration, we reinterpret entropy as the \emph{physical variance} of the initial state for an anti-symmetric drifting field -- high-entropy positions activate stronger drift toward the data manifold while low-entropy positions yield vanishing drift -- enabling single-step (1-NFE) decoding without iterative denoising or distillation. Moreover, both stages share the same entropy signal, which is computed once with no extra cost. Experiments on MAR, TransDiff, and NextStep-1 demonstrate 3.8-5.5$\times$ speedup with genuine 1-NFE decoding, matching or surpassing original quality. Code will be available at https://github.com/aSleepyTree/Drift-AR.
Abstract（参考訳）: 自己回帰(AR)-拡散ハイブリッドパラダイムは、ARの構造的セマンティックモデリングと拡散の高忠実性合成を組み合わせたものだが、逐次ARステージと拡散ビジョンデコードステージの反復多段階認知という2つのスピードボトルネックに悩まされている。既存のメソッドは、統一された原則設計なしで、それぞれを分離して扱う。連続空間ARモデルのパーポジション \emph{prediction entropy} は、自然に空間的に異なる生成の不確実性をコードし、同時にARステージのドラフト予測品質を制御し、これまで完全に探索されていない視覚復号ステージで必要とされる補正努力を反映している。エントロピーは本質的に両方のボトルネックに結びついているので、関節加速の自然な統一信号として機能する。本研究では、エントロピー信号を利用して両方の段階を加速する「textbf{Drift-AR}」を提案する。 1)ARアクセラレーションのためのエントロピー変換(Entropy-Informed Speculative Decoding)を導入し、因果正規化エントロピー損失を介し、過剰なドラフト拒絶を引き起こすエントロピーミスマッチを解消する。 2) 視覚デコーダアクセラレーションでは, エントロピーを, 反対称ドリフト場の初期状態の「emph{physical variance}」として再解釈し, 高エントロピー位置はデータ多様体への強いドリフトを活性化し, 低エントロピー位置は消滅する。さらに、どちらの段階も同じエントロピー信号を共有しており、これは余分なコストなしで1回計算される。 MAR、TransDiff、NextStep-1の実験では、3.8-5.5$\times$ speedup with true 1-NFE decoding, matching or overing original quality。コードはhttps://github.com/aSleepyTree/Drift-ARで入手できる。

論文の概要: Drift-AR: Single-Step Visual Autoregressive Generation via Anti-Symmetric Drifting

関連論文リスト