Fugu-MT 論文翻訳(概要): MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy

論文の概要: MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy

arxiv url: http://arxiv.org/abs/2508.05592v1
Date: Thu, 07 Aug 2025 17:32:14 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-08 21:11:55.690981
Title: MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy
Title（参考訳）: MathSmith: 強化政策による合成問題の鍛造による超硬度数学的推論を目指して
Authors: Shaoxiong Zhan, Yanlin Lai, Ziyu Lu, Dahua Lin, Ziqing Yang, Fei Tang,
Abstract要約: MathSmithは、LSM推論を強化するために数学的な問題に挑戦する新しいフレームワークである。既存の問題を修正するのではなく、MathSmithはPlanetMathからランダムに概念-説明ペアをサンプリングすることで、スクラッチから新しいものを構築する。難易度を高めるために,9つの事前定義された戦略を合理的な制約として設計する。実験によると、MathSmithは短いCoT設定と長いCoT設定の両方で既存のベースラインを一貫して上回っている。
参考スコア（独自算出の注目度）: 43.86485569038631
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models have achieved substantial progress in mathematical reasoning, yet their advancement is limited by the scarcity of high-quality, high-difficulty training data. Existing synthesis methods largely rely on transforming human-written templates, limiting both diversity and scalability. We propose MathSmith, a novel framework for synthesizing challenging mathematical problems to enhance LLM reasoning. Rather than modifying existing problems, MathSmith constructs new ones from scratch by randomly sampling concept-explanation pairs from PlanetMath, ensuring data independence and avoiding contamination. To increase difficulty, we design nine predefined strategies as soft constraints during rationales. We further adopts reinforcement learning to jointly optimize structural validity, reasoning complexity, and answer consistency. The length of the reasoning trace generated under autoregressive prompting is used to reflect cognitive complexity, encouraging the creation of more demanding problems aligned with long-chain-of-thought reasoning. Experiments across five benchmarks, categorized as easy & medium (GSM8K, MATH-500) and hard (AIME2024, AIME2025, OlympiadBench), show that MathSmith consistently outperforms existing baselines under both short and long CoT settings. Additionally, a weakness-focused variant generation module enables targeted improvement on specific concepts. Overall, MathSmith exhibits strong scalability, generalization, and transferability, highlighting the promise of high-difficulty synthetic data in advancing LLM reasoning capabilities.
Abstract（参考訳）: 大規模言語モデルは、数学的推論においてかなりの進歩を遂げてきたが、その進歩は、高品質で高精度な訓練データの不足によって制限されている。既存の合成法は主に人書きテンプレートの変換に依存しており、多様性と拡張性の両方を制限している。 LLM推論を強化するために,難解な数学的問題を合成する新しいフレームワークであるMathSmithを提案する。既存の問題を修正するのではなく、MathSmithはPlanetMathからランダムに概念と説明のペアをサンプリングし、データの独立性を確保し、汚染を避けることで、ゼロから新しいものを構築する。難易度を高めるために,9つの事前定義された戦略を合理的な制約として設計する。さらに、構造的妥当性、複雑性の推論、一貫性の解答を共同で最適化するために強化学習を採用する。自己回帰的プロンプトによって生じる推論トレースの長さは、認知的複雑性を反映し、長鎖の推論と整合したより要求の多い問題の創出を促進するために使用される。 5つのベンチマーク(GSM8K、MATH-500)とハード(AIME2024、AIME2025、OlympiadBench)での実験では、MathSmithは短いCoT設定と長いCoT設定の両方で既存のベースラインを一貫して上回っている。さらに、弱点に焦点を当てた変種生成モジュールは、特定の概念をターゲットとした改善を可能にする。全体として、MathSmithは強力なスケーラビリティ、一般化、転送可能性を示し、LLM推論能力の進歩における高次合成データの約束を強調している。

論文の概要: MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy

関連論文リスト