Fugu-MT 論文翻訳(概要): EduIllustrate: Towards Scalable Automated Generation Of Multimodal Educational Content

論文の概要: EduIllustrate: Towards Scalable Automated Generation Of Multimodal Educational Content

arxiv url: http://arxiv.org/abs/2604.05005v2
Date: Sat, 11 Apr 2026 07:08:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 14:47:45.512001
Title: EduIllustrate: Towards Scalable Automated Generation Of Multimodal Educational Content
Title（参考訳）: EduIllustrate:マルチモーダル教育コンテンツのスケーラブルな自動生成を目指して
Authors: Shuzhen Bi, Mingzi Zhang, Zhuoxuan Li, Xiaolong Wang, Keqian Li, Aimin Zhou,
Abstract要約: 大規模な言語モデルを評価するためのベンチマークであるEduIllustrateを提案する。このベンチマークは、5つの被験者と3つのグレードレベルにまたがる230の課題からなる。 Gemini 3.0 Pro Previewは87.8%、Kim-K2.5は最高のコスト効率を実現している。
参考スコア（独自算出の注目度）: 19.131221541276332
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models are increasingly used as educational assistants, yet evaluation of their educational capabilities remains concentrated on question-answering and tutoring tasks. A critical gap exists for multimedia instructional content generation -- the ability to produce coherent, diagram-rich explanations that combine geometrically accurate visuals with step-by-step reasoning. We present EduIllustrate, a benchmark for evaluating LLMs on interleaved text-diagram explanation generation for K-12 STEM problems. The benchmark comprises 230 problems spanning five subjects and three grade levels, a standardized generation protocol with sequential anchoring to enforce cross-diagram visual consistency, and an 8-dimension evaluation rubric grounded in multimedia learning theory covering both text and visual quality. Evaluation of ten LLMs reveals a wide performance spread: Gemini 3.0 Pro Preview leads at 87.8\%, while Kimi-K2.5 achieves the best cost-efficiency (80.8\% at \\$0.12/problem). Workflow ablation confirms sequential anchoring improves Visual Consistency by 13\% at 94\% lower cost. Human evaluation with 20 expert raters validates LLM-as-judge reliability for objective dimensions ($ρ\geq 0.83$) while revealing limitations on subjective visual assessment.
Abstract（参考訳）: 大規模言語モデルは、教育助手としての利用が増えているが、その教育能力の評価は、質問応答と学習タスクに集中している。マルチメディアのインストラクショナルコンテンツ生成には重要なギャップがあり、幾何学的に正確な視覚とステップバイステップの推論を組み合わせたコヒーレントで図に富んだ説明を生成する能力がある。我々は,K-12 STEM問題に対するインターリーブテキスト・ダイアグラム説明生成のためのLCM評価ベンチマークであるEduIllustrateを提案する。このベンチマークは、5つの主題と3つのグレードにまたがる230の問題と、連続的なアンカーによるクロスダイアグラムの視覚的一貫性を強制する標準化された生成プロトコルと、テキストと視覚的品質の両方をカバーするマルチメディア学習理論に根ざした8次元評価ルーリックからなる。 Gemini 3.0 Pro Previewは87.8\%、Kim-K2.5は80.8\%($0.12/problem)である。ワークフローのアブレーションにより、シーケンシャルなアンカリングにより、ビジュアル一貫性が13\%向上し、94\%のコストが削減される。 20名の鑑定者による人間による評価は、主観的視覚的評価の限界を明らかにしながら、客観的次元に対するLLM-as-judgeの信頼性を評価する(ρ\geq 0.83$)。

論文の概要: EduIllustrate: Towards Scalable Automated Generation Of Multimodal Educational Content

関連論文リスト