Fugu-MT 論文翻訳(概要): Code2Video: A Code-centric Paradigm for Educational Video Generation

論文の概要: Code2Video: A Code-centric Paradigm for Educational Video Generation

arxiv url: http://arxiv.org/abs/2510.01174v1
Date: Wed, 01 Oct 2025 17:56:48 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 16:59:20.713105
Title: Code2Video: A Code-centric Paradigm for Educational Video Generation
Title（参考訳）: Code2Video: 教育用ビデオ生成のためのコード中心のパラダイム
Authors: Yanzhe Chen, Kevin Qinghong Lin, Mike Zheng Shou,
Abstract要約: 我々は,Pythonコードによる教育ビデオを生成するための,コード中心のエージェントフレームワークであるCode2Videoを提案する。 i)講義内容を時間的コヒーレントなフローに構造化するPlanner,(ii)構造化命令を実行可能なPythonコードに変換するCoder,そして(iii)視覚言語モデル(VLM)を視覚的アンカープロンプトで活用し,空間的レイアウトを洗練し,明確性を確保するCryticという3つの協調エージェントから構成される。我々の結果は、Code2Videoが拡張性があり、解釈可能で、制御可能なアプローチである可能性を示し、直接コードよりも40%改善されている。
参考スコア（独自算出の注目度）: 60.03043132859077
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While recent generative models advance pixel-space video synthesis, they remain limited in producing professional educational videos, which demand disciplinary knowledge, precise visual structures, and coherent transitions, limiting their applicability in educational scenarios. Intuitively, such requirements are better addressed through the manipulation of a renderable environment, which can be explicitly controlled via logical commands (e.g., code). In this work, we propose Code2Video, a code-centric agent framework for generating educational videos via executable Python code. The framework comprises three collaborative agents: (i) Planner, which structures lecture content into temporally coherent flows and prepares corresponding visual assets; (ii) Coder, which converts structured instructions into executable Python codes while incorporating scope-guided auto-fix to enhance efficiency; and (iii) Critic, which leverages vision-language models (VLM) with visual anchor prompts to refine spatial layout and ensure clarity. To support systematic evaluation, we build MMMC, a benchmark of professionally produced, discipline-specific educational videos. We evaluate MMMC across diverse dimensions, including VLM-as-a-Judge aesthetic scores, code efficiency, and particularly, TeachQuiz, a novel end-to-end metric that quantifies how well a VLM, after unlearning, can recover knowledge by watching the generated videos. Our results demonstrate the potential of Code2Video as a scalable, interpretable, and controllable approach, achieving 40% improvement over direct code generation and producing videos comparable to human-crafted tutorials. The code and datasets are available at https://github.com/showlab/Code2Video.
Abstract（参考訳）: 最近の生成モデルはピクセル空間のビデオ合成を推し進めているが、専門的な教育ビデオの制作には限界があり、学際的な知識、正確な視覚構造、コヒーレントな遷移を必要とし、教育シナリオにおける適用性を制限する。直感的には、このような要件は、論理的なコマンド(例えばコード)を通じて明示的に制御できるレンダリング可能な環境の操作によって対処される。本研究では,実行可能なPythonコードによる教育ビデオを生成するための,コード中心のエージェントフレームワークであるCode2Videoを提案する。フレームワークは3つの協調エージェントから構成される。一講義内容を時間的に整合した流れに構成し、対応する視覚資産を準備するプランナー (ii)Coderは、スコープ誘導のオートフィックスを取り入れて効率を高めるとともに、構造化命令を実行可能なPythonコードに変換する。三視覚言語モデル(VLM)を視覚的アンカープロンプトで活用し、空間的レイアウトを洗練させ、明確性を確保すること。システム評価を支援するために,専門的な専門的な教育ビデオのベンチマークであるMMMCを構築した。 VLM-as-a-Judgeの美的スコア、コード効率、特に、未学習のVLMが、生成したビデオを見て、どのように知識を回復できるかを定量化する新しいエンドツーエンドメトリックであるTeachQuizなど、さまざまな次元でMMMCを評価した。われわれはCode2Videoをスケーラブルで解釈可能で制御可能なアプローチとして実現し、直接コード生成よりも40%改善し、人為的なチュートリアルに匹敵するビデオを生成する可能性を実証した。コードとデータセットはhttps://github.com/showlab/Code2Videoで公開されている。

論文の概要: Code2Video: A Code-centric Paradigm for Educational Video Generation

関連論文リスト