Fugu-MT 論文翻訳(概要): ExtraVAR: Stage-Aware RoPE Remapping for Resolution Extrapolation in Visual Autoregressive Models

論文の概要: ExtraVAR: Stage-Aware RoPE Remapping for Resolution Extrapolation in Visual Autoregressive Models

arxiv url: http://arxiv.org/abs/2605.10045v1
Date: Mon, 11 May 2026 06:14:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.567454
Title: ExtraVAR: Stage-Aware RoPE Remapping for Resolution Extrapolation in Visual Autoregressive Models
Title（参考訳）: ExtraVAR:視覚的自己回帰モデルにおける分解能外挿のためのステージ対応RoPEリマッピング
Authors: Feihong Yan, Shaoyu Liu, Haixuan Wang, Shuai Lu, Linfeng Zhang, Huiqi Li, Xiangyang Ji,
Abstract要約: 我々は,グローバルな反復,局所的な反復,詳細劣化を抑制するために,ステージアウェアのRoPEリマッピングを提案する。また、分解能不変な正規化エントロピーを介して分散を定量化するエントロピー駆動適応アテンションを提案する。本手法は, 構造コヒーレンスと細部忠実度の両方において, 先行分解能・分光法より常に優れる。
参考スコア（独自算出の注目度）: 52.648413887350195
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual Autoregressive (VAR) models have emerged as a strong alternative to diffusion for image synthesis, yet their fixed training resolution prevents direct generation at higher resolutions. Naively transferring training-free extrapolation methods from LLMs or diffusion models to VAR yields three characteristic failure modes: global repetition, local repetition, and detail degradation. We trace them to a unified band-stage mismatch: VAR generates images in a coarse-to-fine, scale-wise process where each stage is driven by a distinct dominant RoPE frequency band, and each failure mode emerges when the dominant band of a particular stage is disrupted. Building on this insight, we propose Stage-Aware RoPE Remapping, a training-free strategy that assigns each frequency band a stage-specific remapping rule, jointly suppressing all three failure modes. We further observe that attention becomes systematically dispersed as the image resolution increases. Existing methods typically depend on predefined attention scaling factors, which are neither adaptive to the target resolution nor capable of faithfully capturing the actual extent of attention dispersion. We therefore propose Entropy-Driven Adaptive Attention Calibration, which quantifies dispersion via a resolution-invariant normalized entropy and yields a closed-form per-head scaling factor that realigns the extrapolated-resolution attention entropy with its training-resolution counterpart. Extensive experiments show that our method consistently outperforms prior resolution-extrapolation methods in both structural coherence and fine-detail fidelity. Our code is available at https://github.com/feihongyan1/ExtraVAR.
Abstract（参考訳）: Visual Autoregressive (VAR) モデルは、画像合成の拡散の強力な代替手段として登場したが、その固定されたトレーニング解像度は、高解像度での直接生成を妨げている。 LLMや拡散モデルからVARへ学習自由な外挿法をネーティブに転送すると、大域的反復、局所的反復、詳細劣化の3つの特徴的な障害モードが得られる。 VARは、各ステージが独立した支配的なRoPE周波数帯域によって駆動され、各障害モードが特定のステージの支配的なバンドが破壊されたときに出現する粗大でスケールワイズなプロセスで画像を生成する。この知見に基づいて、各周波数帯にステージ固有のリマッピングルールを割り当てるトレーニングフリー戦略であるStage-Aware RoPE Remappingを提案し、3つの障害モード全てを共同で抑制する。さらに、画像の解像度が大きくなるにつれて注意が体系的に分散するのを観察する。既存の手法は通常、対象の解像度に適応せず、実際の注意分散の程度を忠実に把握できない事前定義された注意スケーリング因子に依存している。そこで本研究では,分解能不変な正規化エントロピーによる分散の定量化と,そのトレーニング分解能との外挿分解能アダプティブエントロピーを両立させる閉形式毎のスケーリング係数を導出するエントロピー駆動適応アダプティブアテンションキャリブレーションを提案する。本手法は, 構造的コヒーレンスと細粒度忠実度の両方において, 常に先行分解能・分光法より優れることを示す。私たちのコードはhttps://github.com/feihongyan1/ExtraVAR.comで利用可能です。

論文の概要: ExtraVAR: Stage-Aware RoPE Remapping for Resolution Extrapolation in Visual Autoregressive Models

関連論文リスト