Fugu-MT 論文翻訳(概要): MARGE: Improving Math Reasoning for LLMs with Guided Exploration

論文の概要: MARGE: Improving Math Reasoning for LLMs with Guided Exploration

arxiv url: http://arxiv.org/abs/2505.12500v1
Date: Sun, 18 May 2025 17:24:16 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-20 14:57:11.268009
Title: MARGE: Improving Math Reasoning for LLMs with Guided Exploration
Title（参考訳）: MARGE: ガイド付き探索によるLLMの数学的推論の改善
Authors: Jingyue Gao, Runji Lin, Keming Lu, Bowen Yu, Junyang Lin, Jianyu Chen,
Abstract要約: 大規模言語モデル(LLM)は、数学的推論において強い可能性を示すが、その有効性は高品質なクエリの不足によって制限されることが多い。 textbfMath textbfReasoning with textbfGuided textbfExploration。 MARGEは、自己生成ソリューションに由来する中間的推論状態を体系的に探索し、適切な探索と信用割当の改善を可能にする。
参考スコア（独自算出の注目度）: 31.311075009100048
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) exhibit strong potential in mathematical reasoning, yet their effectiveness is often limited by a shortage of high-quality queries. This limitation necessitates scaling up computational responses through self-generated data, yet current methods struggle due to spurious correlated data caused by ineffective exploration across all reasoning stages. To address such challenge, we introduce \textbf{MARGE}: Improving \textbf{Ma}th \textbf{R}easoning with \textbf{G}uided \textbf{E}xploration, a novel method to address this issue and enhance mathematical reasoning through hit-guided exploration. MARGE systematically explores intermediate reasoning states derived from self-generated solutions, enabling adequate exploration and improved credit assignment throughout the reasoning process. Through extensive experiments across multiple backbone models and benchmarks, we demonstrate that MARGE significantly improves reasoning capabilities without requiring external annotations or training additional value models. Notably, MARGE improves both single-shot accuracy and exploration diversity, mitigating a common trade-off in alignment methods. These results demonstrate MARGE's effectiveness in enhancing mathematical reasoning capabilities and unlocking the potential of scaling self-generated training data. Our code and models are available at \href{https://github.com/georgao35/MARGE}{this link}.
Abstract（参考訳）: 大規模言語モデル(LLM)は、数学的推論において強い可能性を示すが、その有効性は高品質なクエリの不足によって制限されることが多い。この制限は、自己生成データによる計算応答のスケールアップを必要とするが、現在の手法は、すべての推論段階における非効率な探索によって引き起こされる急激な相関データのために苦労している。このような課題に対処するために、我々は \textbf{MARGE}: Improving \textbf{Ma}th \textbf{R}easoning with \textbf{G}uided \textbf{E}xploration, a novel method to address this problem and enhance mathematical reasoning through hit-guided Explor。 MARGEは、自己生成ソリューションから派生した中間的推論状態を体系的に探索し、推論プロセスを通じて適切な探索と信用割当の改善を可能にする。複数のバックボーンモデルとベンチマークにわたる広範な実験を通じて、MARGEは外部アノテーションを必要とせずに推論能力を大幅に改善し、付加価値モデルをトレーニングすることを示した。特に、MARGEは単発精度と探索の多様性の両方を改善し、アライメント手法における共通のトレードオフを軽減する。これらの結果は、MARGEが数学的推論能力を向上し、自己生成したトレーニングデータをスケールする可能性を解き放つ上で有効であることを示す。私たちのコードとモデルは、 \href{https://github.com/georgao35/MARGE}{this link}で利用可能です。

論文の概要: MARGE: Improving Math Reasoning for LLMs with Guided Exploration

関連論文リスト