Fugu-MT 論文翻訳(概要): EduStory: A Unified Framework for Pedagogically-Consistent Multi-Shot STEM Instructional Video Generation

論文の概要: EduStory: A Unified Framework for Pedagogically-Consistent Multi-Shot STEM Instructional Video Generation

arxiv url: http://arxiv.org/abs/2605.09378v1
Date: Sun, 10 May 2026 07:03:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.220124
Title: EduStory: A Unified Framework for Pedagogically-Consistent Multi-Shot STEM Instructional Video Generation
Title（参考訳）: EduStory: Pedagogically-Consistent Multi-Shot STEM Instructional Video Generationのための統一フレームワーク
Authors: Xinyi Wu, Jayant Teotia, Shuai Zhao, Erik Cambria,
Abstract要約: EduStoryは、信頼できるビデオ生成のための統一されたフレームワークである。それは、永続的な知識状態を追跡するための教育的状態モデリング、マルチショットの物語を整理するためのスクリプト誘導型構造化制御、学習指向評価メトリクスを統合する。 EduVideoBenchは、ペタゴラルなストーリーボード、ショットレベルのセマンティクス、知識状態遷移を含む、多彩なアノテーションを備えた診断ベンチマークである。
参考スコア（独自算出の注目度）: 40.60762124779023
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Long-horizon video generation has advanced in visual quality, yet existing methods still struggle to maintain knowledge consistency and coherent pedagogical narratives across multi-shot instructional videos, especially in STEM domains. To address these challenges, we propose EduStory, a unified framework for reliable instructional video generation. EduStory integrates pedagogical state modeling to track persistent knowledge states, script-guided structured control to organize multi-shot narratives, and learning-oriented evaluation metrics to assess knowledge fidelity and constraint satisfaction. To support rigorous evaluation, we further introduce EduVideoBench, a diagnostic benchmark with multi-granularity annotations, including pedagogical storyboards, shot-level semantics, and knowledge state transitions, together with baseline tasks for controllable instructional video generation. Extensive experiments demonstrate that domain-aware state modeling and structured control substantially reduce narrative breakdown and improve alignment with instructional intent. These results highlight the significance of domain-specific structural constraints and tailored benchmarks for advancing reliable, controllable, and also trustworthy long-horizon video generation.
Abstract（参考訳）: ロングホライゾンビデオ生成は、視覚的品質が向上しているが、既存の手法は、特にSTEM領域において、マルチショットの指導ビデオにおける知識の一貫性と一貫性のある教育的物語の維持に苦慮している。これらの課題に対処するために,信頼性のあるビデオ生成のための統合フレームワークであるEduStoryを提案する。 EduStoryは、永続的な知識状態を追跡するための教育的状態モデリング、マルチショットの物語を整理するためのスクリプト誘導型構造化制御、知識の忠実さと制約満足度を評価するための学習指向評価指標を統合する。厳密な評価を支援するため,ペタゴジカルなストーリーボードやショットレベルのセマンティクス,知識状態遷移など,多粒度アノテーションを用いた診断ベンチマークであるEduVideoBenchと,制御可能なビデオ生成のためのベースラインタスクについても紹介する。ドメイン認識状態モデリングと構造化制御が物語の分解を著しく減らし、指示意図との整合性を向上することを示した。これらの結果は、信頼性が高く、制御可能で、また信頼できる長距離ビデオ生成のための、ドメイン固有の構造制約と調整されたベンチマークの重要性を強調している。

論文の概要: EduStory: A Unified Framework for Pedagogically-Consistent Multi-Shot STEM Instructional Video Generation

関連論文リスト