Fugu-MT 論文翻訳(概要): MAGMA-Edu: Multi-Agent Generative Multimodal Framework for Text-Diagram Educational Question Generation

論文の概要: MAGMA-Edu: Multi-Agent Generative Multimodal Framework for Text-Diagram Educational Question Generation

arxiv url: http://arxiv.org/abs/2511.18714v1
Date: Mon, 24 Nov 2025 03:13:26 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-25 18:34:24.98897
Title: MAGMA-Edu: Multi-Agent Generative Multimodal Framework for Text-Diagram Educational Question Generation
Title（参考訳）: MAGMA-Edu:テキストダイアグラム教育質問生成のためのマルチエージェント生成型マルチモーダルフレームワーク
Authors: Zhenyu Wu, Jian Li, Hua Huang,
Abstract要約: 本稿では,テキスト推論と図形合成を統合した自己反射型マルチエージェントフレームワークMAGMA-Eduを紹介する。 MAGMA-Eduは,(1)質問文や解を数学的精度で反復的に洗練する生成検証・回帰ループ,(2)幾何学的忠実度と意味的アライメントを強制するコードベースの中間表現という,2段階の共進化的パイプラインを採用している。
参考スコア（独自算出の注目度）: 24.375206958505427
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Educational illustrations play a central role in communicating abstract concepts, yet current multimodal large language models (MLLMs) remain limited in producing pedagogically coherent and semantically consistent educational visuals. We introduce MAGMA-Edu, a self-reflective multi-agent framework that unifies textual reasoning and diagrammatic synthesis for structured educational problem generation. Unlike existing methods that treat text and image generation independently, MAGMA-Edu employs a two-stage co-evolutionary pipeline: (1) a generation-verification-reflection loop that iteratively refines question statements and solutions for mathematical accuracy, and (2) a code-based intermediate representation that enforces geometric fidelity and semantic alignment during image rendering. Both stages are guided by internal self-reflection modules that evaluate and revise outputs until domain-specific pedagogical constraints are met. Extensive experiments on multimodal educational benchmarks demonstrate the superiority of MAGMA-Edu over state-of-the-art MLLMs. Compared to GPT-4o, MAGMA-Edu improves the average textual metric from 57.01 to 92.31 (+35.3 pp) and boosts image-text consistency (ITC) from 13.20 to 85.24 (+72 pp). Across all model backbones, MAGMA-Edu achieves the highest scores (Avg-Text 96.20, ITC 99.12), establishing a new state of the art for multimodal educational content generation and demonstrating the effectiveness of self-reflective multi-agent collaboration in pedagogically aligned vision-language reasoning.
Abstract（参考訳）: 教育イラストは抽象概念の伝達において中心的な役割を担っているが、現在のマルチモーダル・大規模言語モデル(MLLM)は、教育的な一貫性と意味的に整合した教育的視覚を創出することに制限されている。我々は、構造化された教育問題生成のためのテキスト推論と図形合成を統一する自己反射型マルチエージェントフレームワークであるMAGMA-Eduを紹介する。テキストと画像生成を独立に扱う既存の方法とは異なり、MAGMA-Eduは、(1)疑問文や解を数学的精度のために反復的に洗練する生成検証・修正ループ、(2)画像レンダリング中の幾何学的忠実さと意味的アライメントを強制するコードベースの中間表現という、2段階の共進化パイプラインを採用している。どちらの段階も内部の自己回帰モジュールによってガイドされ、ドメイン固有の教育制約が満たされるまでアウトプットを評価し、修正する。マルチモーダル教育ベンチマークの大規模な実験は、最先端のMLLMよりもMAGMA-Eduの方が優れていることを示した。 GPT-4oと比較して、MAGMA-Eduは平均テキストメトリックを57.01から92.31(+35.3pp)に改善し、画像テキスト一貫性(ITC)を13.20から85.24(+72pp)に向上させた。全てのモデルバックボーン全体で、MAGMA-Eduは最高スコア(Avg-Text 96.20, ITC 99.12)を達成し、マルチモーダルな教育コンテンツ生成のための新しい最先端技術を確立し、教育的な視覚言語推論における自己反射的マルチエージェント協調の有効性を実証した。

論文の概要: MAGMA-Edu: Multi-Agent Generative Multimodal Framework for Text-Diagram Educational Question Generation

関連論文リスト