Fugu-MT 論文翻訳(概要): CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

論文の概要: CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

arxiv url: http://arxiv.org/abs/2605.12496v1
Date: Tue, 12 May 2026 17:59:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 21:48:57.085264
Title: CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives
Title（参考訳）: CausalCine:マルチショットビデオナラティブのためのリアルタイム自動回帰生成
Authors: Yihao Meng, Zichen Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Yue Yu, Hanlin Wang, Haobo Li, Jiapeng Zhu, Yanhong Zeng, Xing Zhu, Yujun Shen, Qifeng Chen, Huamin Qu,
Abstract要約: CausalCineはインタラクティブな自動回帰フレームワークで、マルチショットビデオ生成をオンラインのディレクティブプロセスに変換する。 CausalCineはショット変更を因果的に生成し、動的プロンプトをオンザフライで受け付け、以前のショットを再生することなくコンテキストを再利用する。
参考スコア（独自算出の注目度）: 117.85963914353904
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autoregressive video generation aims at real-time, open-ended synthesis. Yet, cinematic storytelling is not merely the endless extension of a single scene; it requires progressing through evolving events, viewpoint shifts, and discrete shot boundaries. Existing autoregressive models often struggle in this setting. Trained primarily for short-horizon continuation, they treat long sequences as extended single shots, inevitably suffering from motion stagnation and semantic drift during long rollouts. To bridge this gap, we introduce CausalCine, an interactive autoregressive framework that transforms multi-shot video generation into an online directing process. CausalCine generates causally across shot changes, accepts dynamic prompts on the fly, and reuses context without regenerating previous shots. To achieve this, we first train a causal base model on native multi-shot sequences to learn complex shot transitions prior to acceleration. We then propose Content-Aware Memory Routing (CAMR), which dynamically retrieves historical KV entries according to attention-based relevance scores rather than temporal proximity, preserving cross-shot coherence under bounded active memory. Finally, we distill the causal base model into a few-step generator for real-time interactive generation. Extensive experiments demonstrate that CausalCine significantly outperforms autoregressive baselines and approaches the capability of bidirectional models while unlocking the streaming interactivity of causal generation. Demo available at https://yihao-meng.github.io/CausalCine/
Abstract（参考訳）: 自動回帰ビデオ生成は、リアルタイムでオープンな合成を目的としている。しかし、シネマティック・ストーリーテリングは単に単一のシーンの無限の延長ではなく、進化するイベント、視点シフト、離散的なショット境界の進行を必要とする。既存の自己回帰モデルは、この設定でしばしば苦労する。主に短水平継続のために訓練され、長いシークエンスを延長シングルショットとして扱い、長時間のロールアウト中に必然的に動きの停滞とセマンティックドリフトに悩まされる。このギャップを埋めるために、マルチショットビデオ生成をオンラインのディレクティブプロセスに変換するインタラクティブな自動回帰フレームワークであるCausalCineを紹介します。 CausalCineはショット変更を因果的に生成し、動的プロンプトをオンザフライで受け付け、以前のショットを再生することなくコンテキストを再利用する。そこで我々は、まずネイティブなマルチショットシーケンスの因果ベースモデルを訓練し、アクセラレーションに先立って複雑なショット遷移を学習する。次に、時間的近接性ではなく、注意に基づく関連スコアに基づいて、歴史的KVエントリを動的に検索し、有界なアクティブメモリ下でのクロスショットコヒーレンスを保存する、Content-Aware Memory Routing (CAMR)を提案する。最後に、実時間対話型生成のための数ステップの発電機に因果ベースモデルを蒸留する。大規模な実験により、CausalCineは自己回帰ベースラインを著しく上回り、因果生成のストリーミング対話性を解き放ちながら双方向モデルの能力にアプローチすることを示した。デモはhttps://yihao-meng.github.io/CausalCine/で公開されている。

論文の概要: CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

関連論文リスト