Fugu-MT 論文翻訳(概要): DrawVideo: Generating Long Video from Storyboard Keyframe Sketches

論文の概要: DrawVideo: Generating Long Video from Storyboard Keyframe Sketches

arxiv url: http://arxiv.org/abs/2605.23508v1
Date: Fri, 22 May 2026 11:16:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 17:29:20.326938
Title: DrawVideo: Generating Long Video from Storyboard Keyframe Sketches
Title（参考訳）: DrawVideo: ストーリーボードのキーボードから長いビデオを生成する
Authors: Chuanzhi Xu, Huiqi Liang, Bang Shi, Huiming Zhang, Yifan Xiao, Guangcheng Lin, Haodong Chen, Qiang Qu, Zhicheng Lu, Weidong Cai,
Abstract要約: DrawVideoはスケッチ誘導型、ストーリーボード駆動で、コントロール可能な長ビデオ生成のためのフレームワークだ。長いビデオを独立して制御可能なショットに分解し、それぞれが白黒のスケッチ、外観プロンプト、モーションプロンプトで定義される。実験では、DrawVideoは強力な構造制御性、外観整合性、視覚安定性、コヒーレントな長ビデオ生成を実現している。
参考スコア（独自算出の注目度）: 14.777037981233079
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Long video generation requires high-fidelity synthesis, coherent narrative structure, and user control over extended time spans. Existing text-to-video methods often rely on a single long prompt, limiting control over pose, composition, layout, and motion. We propose DrawVideo, a sketch-guided, storyboard-driven framework for controllable long-video generation. DrawVideo decomposes long videos into independently controllable shots, each defined by a black-and-white sketch, an appearance prompt, and a motion prompt. The sketch controls pose and layout, the appearance prompt defines identity, scene, and style, and the motion prompt guides temporal dynamics. DrawVideo follows a hierarchical 'global multi-shot, local single-sketch' strategy: it first generates a structure-aligned reference keyframe, then expands the motion prompt into derivative keyframes representing action states, and finally synthesizes clips between adjacent keyframes to build each shot. We also introduce SketchLongVideo, the first dataset for sketch-guided text-to-long-video generation, constructed from animation videos via shot detection, keyframe extraction, vision-language recognition, prompt decomposition, and sketch conversion. Experiments show that DrawVideo achieves strong structural controllability, appearance consistency, visual stability, and coherent long-video generation.
Abstract（参考訳）: ロングビデオ生成には、高忠実性合成、コヒーレントな物語構造、長期にわたるユーザコントロールが必要である。既存のテキスト・トゥ・ビデオの手法は、ポーズ、構成、レイアウト、動きの制御を制限する単一の長いプロンプトに依存していることが多い。そこで我々は,DrawVideoを提案する。DrawVideoはスケッチ誘導型,ストーリーボード駆動型で,制御可能な長ビデオ生成のためのフレームワークである。 DrawVideoは、長いビデオを独立してコントロール可能なショットに分解し、それぞれを白黒のスケッチ、外観プロンプト、モーションプロンプトで定義する。スケッチはポーズとレイアウトを制御し、外観プロンプトはアイデンティティ、シーン、スタイルを定義し、動きプロンプトは時間的ダイナミクスを導く。 DrawVideoは階層的な「グローバルなマルチショット、ローカルなシングルスケッチ」戦略に従い、まず構造に沿って参照キーフレームを生成し、次にアクション状態を表す派生キーフレームにモーションプロンプトを拡大し、最後に隣接するキーフレーム間のクリップを合成して各ショットを構築する。また,画像検出,キーフレーム抽出,視覚言語認識,即時分解,スケッチ変換によるアニメーションビデオから構築した,スケッチガイド付きテキスト・ビデオ生成のための最初のデータセットであるSketchLongVideoも紹介した。実験の結果,DrawVideoは強い構造制御性,外観整合性,視覚的安定性,コヒーレントな長ビデオ生成を実現することがわかった。

論文の概要: DrawVideo: Generating Long Video from Storyboard Keyframe Sketches

関連論文リスト