Fugu-MT 論文翻訳(概要): AdaState: Self-Evolving Anchors for Streaming Video Generation

論文の概要: AdaState: Self-Evolving Anchors for Streaming Video Generation

arxiv url: http://arxiv.org/abs/2605.30349v1
Date: Thu, 28 May 2026 17:59:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:56.761982
Title: AdaState: Self-Evolving Anchors for Streaming Video Generation
Title（参考訳）: AdaState: ビデオ生成をストリーミングするための自己進化型アンカー
Authors: Yusuf Dalva, Pinar Yanardag,
Abstract要約: 自己回帰ビデオ拡散モデルは、フレームを逐次生成し、以前に生成されたコンテンツに対して各チャンクを条件付けることによって、ストリーミングビデオを生成する。静的なアンカーを適応的な状態に置き換えます。それは、モデルがすべてのチャンクでコンテンツと一緒に飾るが、決してレンダリングしない、隠れたラテントです。実験により、適応状態は映像のダイナミックスを大幅に改善し、よりリッチな動きと自然のシーンの進行を可能にした。
参考スコア（独自算出の注目度）: 19.753221929746417
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autoregressive video diffusion models generate streaming video by producing frames sequentially, conditioning each chunk on previously generated content. These models are structurally anchored to the first frame: its key-value representation occupies a privileged position in the attention cache and serves as the primary scene reference throughout generation. As the cleanest and most error-free position in the cache, this anchor draws disproportionate attention, suppressing video dynamics, and locking scene composition to the initial viewpoint even as the scene naturally evolves. The result is a temporally shallow video in which motion, camera movement, and scene progression are dampened in favor of static consistency. To address this, we replace the static anchor with an adaptive state, a hidden latent that the model denoises alongside content at every chunk but never renders. Rather than referencing a frozen first frame, the model generates its own scene anchor at each step by attending to both the previous state and the current content, producing a reference that evolves with the generated content. Unlike standard video generation, which encodes an absolute notion of time, our formulation treats time as relative: every generation step sees the same positional structure regardless of how far generation has progressed, and the state transition is identical at every chunk. Together, these properties introduce a recurrence into the generation process, where denoising serves as the transition function, and the KV cache serves as the carrier, requiring no external module. Experiments demonstrate that the adaptive state substantially improves video dynamics, enabling richer motion and natural scene progression within generated videos.
Abstract（参考訳）: 自己回帰ビデオ拡散モデルは、フレームを逐次生成し、以前に生成されたコンテンツに対して各チャンクを条件付けることによって、ストリーミングビデオを生成する。これらのモデルは第1フレームに構造的に固定されており、キー値表現はアテンションキャッシュ内の特権的な位置を占め、世代を通して主要なシーン参照として機能する。このアンカーはキャッシュ内の最もクリーンでエラーのない位置として、不均等な注意を引き、映像のダイナミクスを抑え、シーンが自然に進化してもシーン構成を初期視点にロックする。その結果、静的な一貫性を優先して、動き、カメラの動き、シーンの進行を減衰させる、時間的に浅い映像が得られた。これを解決するために、静的アンカーを適応的な状態に置き換えます。凍結した第1フレームを参照するのではなく、前の状態と現在のコンテンツの両方に参画することで、各ステップで独自のシーンアンカーを生成し、生成されたコンテンツとともに進化する参照を生成する。時間の概念をエンコードする標準的なビデオ生成とは異なり、私たちの定式化は時間を相対的に扱います。これらの特性は、生成プロセスに繰り返し導入され、デノナイズが遷移関数として機能し、KVキャッシュがキャリアとして機能し、外部モジュールを必要としない。実験により、適応状態は映像のダイナミックスを大幅に改善し、よりリッチな動きと自然のシーンの進行を可能にした。

論文の概要: AdaState: Self-Evolving Anchors for Streaming Video Generation

関連論文リスト