Fugu-MT 論文翻訳(概要): Investigating Bias: A Multilingual Pipeline for Generating, Solving, and Evaluating Math Problems with LLMs

論文の概要: Investigating Bias: A Multilingual Pipeline for Generating, Solving, and Evaluating Math Problems with LLMs

arxiv url: http://arxiv.org/abs/2509.17701v1
Date: Mon, 22 Sep 2025 12:38:09 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-23 18:58:16.380869
Title: Investigating Bias: A Multilingual Pipeline for Generating, Solving, and Evaluating Math Problems with LLMs
Title（参考訳）: バイアスの調査: LLMによる数学問題の生成、解決、評価のための多言語パイプライン
Authors: Mariam Mahran, Katharina Simbeck,
Abstract要約: 本稿では,ドイツのK-10カリキュラムに適合する数学問題を生成,解決,評価するための自動多言語パイプラインを提案する。 628の数学演習を生成し、それらを英語、ドイツ語、アラビア語に翻訳しました。 3つの商用LCMが各言語でステップバイステップのソリューションを作成するように促された。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly used for educational support, yet their response quality varies depending on the language of interaction. This paper presents an automated multilingual pipeline for generating, solving, and evaluating math problems aligned with the German K-10 curriculum. We generated 628 math exercises and translated them into English, German, and Arabic. Three commercial LLMs (GPT-4o-mini, Gemini 2.5 Flash, and Qwen-plus) were prompted to produce step-by-step solutions in each language. A held-out panel of LLM judges, including Claude 3.5 Haiku, evaluated solution quality using a comparative framework. Results show a consistent gap, with English solutions consistently rated highest, and Arabic often ranked lower. These findings highlight persistent linguistic bias and the need for more equitable multilingual AI systems in education.
Abstract（参考訳）: 大規模言語モデル(LLM)は、ますます教育支援に使われているが、その応答品質は相互作用の言語によって異なる。本稿では,ドイツのK-10カリキュラムに適合する数学問題を生成,解決,評価するための自動多言語パイプラインを提案する。 628の数学演習を生成し、それらを英語、ドイツ語、アラビア語に翻訳しました。 3つの商用LCM(GPT-4o-mini、Gemini 2.5 Flash、Qwen-plus)は、各言語でステップバイステップのソリューションを作成するよう促された。 Claude 3.5 Haikuを含むLLM審査員のパネルは、比較フレームワークを使用してソリューションの品質を評価した。結果は一貫したギャップを示しており、英語の解が常に最高と評価され、アラビア語はしばしば下位にランクされている。これらの知見は、永続的な言語バイアスと、教育におけるより公平な多言語AIシステムの必要性を浮き彫りにしている。

論文の概要: Investigating Bias: A Multilingual Pipeline for Generating, Solving, and Evaluating Math Problems with LLMs

関連論文リスト