Fugu-MT 論文翻訳(概要): A Multi-Agent Approach to Validate and Refine LLM-Generated Personalized Math Problems

論文の概要: A Multi-Agent Approach to Validate and Refine LLM-Generated Personalized Math Problems

arxiv url: http://arxiv.org/abs/2604.05160v1
Date: Mon, 06 Apr 2026 20:47:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-08 17:42:09.483423
Title: A Multi-Agent Approach to Validate and Refine LLM-Generated Personalized Math Problems
Title（参考訳）: LLM生成パーソナライズされた数学問題の検証と再定義のためのマルチエージェントアプローチ
Authors: Fareya Ikram, Nischal Ashok Kumar, Junyang Lu, Hunter McNichols, Candace Walkington, Neil Heffernan, Andrew S. Lan,
Abstract要約: 本稿では,パーソナライズを反復生成-検証-修正プロセスとして形式化するフレームワークを提案する。可解性, 現実性, 可読性, 真正性の基準を対象とする, 4つの特殊検証エージェントを用いた。我々は、人気のあるオンライン数学の宿題プラットフォームであるASSISTmentsから引き出された600の問題の枠組みを評価した。
参考スコア（独自算出の注目度）: 5.687145473906
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Students benefit from math problems contextualized to their interests. Large language models (LLMs) offer promise for efficient personalization at scale. However, LLM-generated personalized problems may often have problems such as unrealistic quantities and contexts, poor readability, limited authenticity with respect to students' experiences, and occasional mathematical inconsistencies. To alleviate these problems, we propose a multi-agent framework that formalizes personalization as an iterative generate--validate--revise process; we use four specialized validator agents targeting the criteria of solvability, realism, readability, and authenticity, respectively. We evaluate our framework on 600 problems drawn from a popular online mathematics homework platform, ASSISTments, personalizing each problem to a fixed set of 20 student interest topics. We compare three refinement strategies that differ in how validation feedback is coordinated into revisions. Results show that authenticity and realism are the most frequent failure modes in initial LLM-personalized problems, but that a single refinement iteration substantially reduces these failures. We further find that different refinement strategies have different strengths on different criteria. We also assess validator reliability via human evaluation. Results show that reliability is highest on realism and lowest on authenticity, highlighting the need for better evaluation protocols that consider teachers' and students' personal characteristics.
Abstract（参考訳）: 学生は、自分の興味に文脈化された数学の問題から恩恵を受ける。大規模言語モデル(LLM)は、大規模で効率的なパーソナライズを約束する。しかし、LLMが生成したパーソナライズされた問題には、非現実的な量や文脈、可読性の低さ、学生の経験に対する限られた信頼性、時には数学的不整合といった問題がある。これらの問題を緩和するために, 個人化を反復生成-検証-修正プロセスとして形式化するマルチエージェントフレームワークを提案し, 可解性, リアリズム, 可読性, 信頼性の基準を目標とした4つの特殊検証エージェントを用いた。我々は,一般的なオンライン数学の宿題プラットフォームであるASSISTmentsから引き出された600の問題の枠組みを評価し,各問題を20の学生関心トピックの固定セットにパーソナライズする。検証フィードバックの調整方法が異なる3つの改善戦略を比較した。その結果,LLMの個人化問題において,信頼度とリアリズムは最も頻繁な障害モードであることがわかった。さらに、異なる改善戦略は異なる基準で異なる強みを持つことがわかりました。また,人間による評価によって検証者の信頼性を評価する。その結果、信頼性は現実主義が最も高く、信頼度は最低であり、教師や生徒の個人的特性を考慮したより良い評価プロトコルの必要性が浮き彫りになった。

論文の概要: A Multi-Agent Approach to Validate and Refine LLM-Generated Personalized Math Problems

関連論文リスト