Fugu-MT 論文翻訳(概要): From Static to Dynamic: Adaptive Monte Carlo Search for Mathematical Process Supervision

論文の概要: From Static to Dynamic: Adaptive Monte Carlo Search for Mathematical Process Supervision

arxiv url: http://arxiv.org/abs/2509.24351v1
Date: Mon, 29 Sep 2025 06:52:35 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.796935
Title: From Static to Dynamic: Adaptive Monte Carlo Search for Mathematical Process Supervision
Title（参考訳）: 静的から動的へ:数理過程スーパービジョンのための適応モンテカルロ探索
Authors: Jie Ma, Shihao Qi, Rui Xing, Ziang Yin, Bifan Wei, Jun Liu, Tongliang Liu,
Abstract要約: 既存手法は, 定予算サンプリング戦略に基づいて, 推論ステップの質を推定する。本稿では,データ生成を静的から適応に変換するフレームワークであるAdaptive Monte Carlo Search (AMCS)を提案する。 AMCSは、より多くのサンプルを不確実な推論ステップに割り当てることによって、予測を適応的に洗練し、予測しやすくする。
参考スコア（独自算出の注目度）: 49.59309446816251
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The quality of process data plays a key role in training a Process Reward Model (PRM), which can enhance the complex mathematical reasoning capability of large language models. Existing methods estimate the quality of reasoning steps based on a fixed-budget sampling strategy and navigate a vast search space to perform path expansion during the automated data generation process, resulting in their inefficiency and inflexibility. To address these issues, we propose Adaptive Monte Carlo Search (AMCS), a framework that transforms data generation from fixed, static to adaptive, dynamic search at the level of node value estimation and path expansion. On one hand, AMCS adaptively refines estimation by allocating more samples to uncertain reasoning steps while using fewer samples for those that are easier to estimate. On the other hand, it enhances the path expansion through a Monte Carlo algorithm with a temporally adaptive policy that begins with broad exploration and gradually shifts toward exploiting the most promising directions. With AMCS, we construct a large-scale dataset MathSearch-200K of about 200K process supervision examples for training PRMs. To verify the effectiveness of our method, we conduct extensive experiments on four mathematical reasoning benchmarks. Experimental results show that Qwen2.5-Math-7B-PRM-AMCS achieves up to 76.2% accuracy on MATH500 with GLM-4-9B, outperforming all baseline PRMs. Notably, a 7B model supervised by Qwen2.5-Math-7B-PRM-AMCS surpasses a 72B model with weaker supervision. Moreover, Qwen2.5-Math-7B-PRM-AMCS maintains consistent advantages on out-of-distribution problems, demonstrating strong generalization capability. Our code is available at https://github.com/reml-group/AMCS.
Abstract（参考訳）: プロセスデータの品質は、大規模言語モデルの複雑な数学的推論能力を高めるプロセス・リワード・モデル(PRM)のトレーニングにおいて重要な役割を果たす。既存の手法では、固定予算サンプリング戦略に基づいて推論ステップの質を推定し、膨大な探索空間をナビゲートして、自動データ生成プロセス中に経路拡張を行い、その結果、その非効率性と柔軟性をもたらす。これらの問題に対処するため、我々は、ノード値の推定と経路拡張のレベルにおいて、データ生成を固定された静的から適応的な動的探索に変換するフレームワークであるAdaptive Monte Carlo Search (AMCS)を提案する。一方、AMCSは、多くのサンプルを不確実な推論ステップに割り当てることによって、予測を適応的に洗練し、見積もりが容易なサンプルを少なくする。一方、モンテカルロアルゴリズムによる経路拡張は、時間適応的なポリシーによって拡張され、広い探索から始まり、徐々に最も有望な方向の活用へと移行する。 AMCSを用いて,約200KプロセスのPRMをトレーニングするための大規模データセットMathSearch-200Kを構築した。提案手法の有効性を検証するため,4つの数学的推論ベンチマークについて広範な実験を行った。実験の結果、Qwen2.5-Math-7B-PRM-AMCS は GLM-4-9B で MATH500 の76.2% の精度を達成し、全てのベースライン PRM を上回った。特に、Qwen2.5-Math-7B-PRM-AMCSが監督する7Bモデルは、監督の弱い72Bモデルを上回る。さらに、Qwen2.5-Math-7B-PRM-AMCSは分布外問題に対して一貫した優位性を維持し、強力な一般化能力を示している。私たちのコードはhttps://github.com/reml-group/AMCS.comで利用可能です。

論文の概要: From Static to Dynamic: Adaptive Monte Carlo Search for Mathematical Process Supervision

関連論文リスト