Fugu-MT 論文翻訳(概要): Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video

論文の概要: Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video

arxiv url: http://arxiv.org/abs/2605.15182v1
Date: Thu, 14 May 2026 17:58:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:35.010666
Title: Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video
Title（参考訳）: ワープ・アズ・ヒストリー:1つのトレーニングビデオから一般的なカメラ制御ビデオを生成する
Authors: Yifan Wang, Tong He,
Abstract要約: 本稿では,カメラによるワープをカメラワープした擬似歴史に変換するシンプルなインタフェースを提案する。我々は,その位置エンコーディングと対象フレームの識別とを整合させ,正確な情報源観測を行なわずに歪んだ歴史トークンを除去する。本手法は,テスト時間最適化やターゲット映像適応を伴わずに,カメラの付着性,視覚的品質,動きのダイナミクスを改善する。
参考スコア（独自算出の注目度）: 19.675672131137382
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Camera-controlled video generation has made substantial progress, enabling generated videos to follow prescribed viewpoint trajectories. However, existing methods usually learn camera-specific conditioning through camera encoders, control branches, or attention and positional-encoding modifications, which often require post-training on large-scale camera-annotated videos. Training-free alternatives avoid such post-training, but often shift the cost to test-time optimization or extra denoising-time guidance. We propose Warp-as-History, a simple interface that turns camera-induced warps into camera-warped pseudo-history with target-frame positional alignment and visible-token selection. Given a target camera trajectory, we construct camera-warped pseudo-history from past observations and feed it through the model's visual-history pathway. Crucially, we align its positional encoding with the target frames being denoised and remove warped-history tokens without valid source observations. Without any training, architectural modification, or test-time optimization, this interface reveals a non-trivial zero-shot capability of a frozen video generation model to follow camera trajectories. Moreover, lightweight offline LoRA finetuning on only one camera-annotated video further improves this capability and generalizes to unseen videos, improving camera adherence, visual quality, and motion dynamics without test-time optimization or target-video adaptation. Extensive experiments on diverse datasets confirm the effectiveness of our method.
Abstract（参考訳）: カメラ制御ビデオ生成は大幅に進歩し、生成されたビデオは所定の視点軌跡に従うことができる。しかし、既存の手法は通常、カメラエンコーダ、制御ブランチ、注意や位置エンコードの変更を通じてカメラ固有の条件付けを学習する。トレーニングなしの代替手段は、そのようなポストトレーニングを避けるが、しばしばコストをテスト時間最適化や余分な装飾時間ガイダンスにシフトする。我々は、カメラによるワープを、ターゲットフレームの位置アライメントと可視性選択を備えた、カメラウォープされた擬似ヒストリーに変換するシンプルなインターフェースであるワープ・アズ・ヒストリーを提案する。対象となるカメラの軌跡を考慮し,過去の観測からカメラウォープされた擬似歴史を構築し,そのモデルによる視覚的歴史経路を通じてそれを供給する。重要なことは、位置エンコーディングと対象のフレームを識別して整列し、ソースの正確な観測を行なわずに歪んだ歴史トークンを除去する。トレーニングやアーキテクチャの変更、テストタイムの最適化がなければ、このインターフェースは、カメラの軌跡を追従するための凍結ビデオ生成モデルの非自明なゼロショット機能を明らかにする。さらに、1つのカメラアノテートビデオのみの軽量オフラインLoRAファインタニングにより、この機能はさらに改善され、未確認ビデオに一般化され、テスト時間最適化やターゲット映像適応なしに、カメラの付着性、視覚的品質、モーションダイナミクスが改善される。多様なデータセットに対する大規模な実験により,本手法の有効性が確認された。

論文の概要: Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video

関連論文リスト