Fugu-MT 論文翻訳(概要): DramaDirector: Geometry-Guided Short Drama Generation

論文の概要: DramaDirector: Geometry-Guided Short Drama Generation

arxiv url: http://arxiv.org/abs/2606.24107v1
Date: Tue, 23 Jun 2026 03:50:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 22:16:48.757736
Title: DramaDirector: Geometry-Guided Short Drama Generation
Title（参考訳）: ドラマディレクタ:ジオメトリガイドによるショートドラマジェネレーション
Authors: Hengji Zhou, Sijie Liu, Jianrun Chen, Xingchen Zou, Lianghao Xia, Liqiang Nie,
Abstract要約: 本研究では,プロット・トゥ・ショート・ドラマ生成について検討し,プロット・トゥ・ショート・ドラマ生成において,グローバル・プロットとローカル・コンテクストを視覚的にグラウンド化されたマルチショット・ビデオに変換する。そこで,本研究では,映像ギャラリーから映像幾何学を借用可能な幾何学的枠組みであるDramaDirectorを提案する。また、35のライブアクションドラマ、2.8Kエピソード、81Kショットで作られたベンチマークであるDramaBoardを紹介します。
参考スコア（独自算出の注目度）: 51.430988173490384
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Short dramas, with their rapid shot rhythms, dialogue-driven focus shifts, and demanding cinematographic grounding, pose challenges that prompt-level or text-only video generation pipelines struggle to meet. We study plot-to-short-drama generation, where a global plot and local context are transformed into visually grounded multi-shot videos. We propose DramaDirector, a geometry-grounded framework that lets the planner borrow cinematographic geometry from a gallery of real short-drama shots indexed by depth and pose. DramaDirector decouples each shot into static visual and dynamic narrative conditions, trains the planner with schema-constrained SFT and GRPO under a learned text-visual alignment reward, and retrieves depth-pose references to guide first-frame generation and image-to-video synthesis. We also introduce DramaBoard, a benchmark built from 35 live-action dramas, 2.8K episodes, and 81K shots, with structured storyboards and multi-dimensional evaluation protocols. Experiments show that DramaDirector improves over representative multi-agent and video generation baselines on faithfulness, consistency, and controllability. Our code is released at: https://github.com/iLearn-Lab/DramaDirector
Abstract（参考訳）: ショートドラマは、素早いショットリズム、対話駆動のフォーカスシフト、撮影基盤の要求などがあり、プロンプトレベルやテキストのみのビデオ生成パイプラインが満たすのに苦労する課題を提起する。本研究では,プロット・トゥ・ショート・ドラマ生成について検討し,プロット・トゥ・ショート・ドラマ生成において,グローバル・プロットとローカル・コンテクストを視覚的にグラウンド化されたマルチショット・ビデオに変換する。そこで,本研究では,映像の深度とポーズを指標とした映像のギャラリーから,映像幾何学的図形を借用可能な幾何学的接地型フレームワークDramaDirectorを提案する。ドラマディレクタは、各ショットを静的な視覚的および動的な物語的条件に分離し、学習されたテキスト-視覚的アライメント報酬の下で、スキーマ制約付きSFTとGRPOでプランナーを訓練し、第一フレームの生成と画像-ビデオ合成をガイドする深さ-目的参照を取得する。また、35のライブアクションドラマ、2.8Kエピソード、81Kショットで構築されたベンチマークであるDramaBoardを紹介し、構造化されたストーリーボードと多次元評価プロトコルについて紹介する。実験により,DramaDirectorは,忠実度,一貫性,制御性に基づく代表的マルチエージェントおよびビデオ生成ベースラインよりも改善されていることが示された。私たちのコードは、https://github.com/iLearn-Lab/DramaDirectorでリリースされています。

論文の概要: DramaDirector: Geometry-Guided Short Drama Generation

関連論文リスト