Fugu-MT 論文翻訳(概要): rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

論文の概要: rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

arxiv url: http://arxiv.org/abs/2501.04519v1
Date: Wed, 08 Jan 2025 14:12:57 GMT
ステータス: 翻訳完了
システム内更新日: 2025-01-09 16:10:19.65252
Title: rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Title（参考訳）: rStar-Math:小さなLLMは自己進化した深層思考で数学をマスターできる
Authors: Xinyu Guan, Li Lyna Zhang, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu, Fan Yang, Mao Yang,
Abstract要約: 本稿では,小型言語モデル (SLM) が OpenAI o1 の算術的推論能力に匹敵するか,超越するかを示すために rStar-Math を提案する。我々はモンテカルロ木探索(MCTS)を通して「深層思考」を実践し,SLMに基づくプロセス報酬モデルによるテスト時間探索を行う。
参考スコア（独自算出の注目度）: 15.38166914134102
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present rStar-Math to demonstrate that small language models (SLMs) can rival or even surpass the math reasoning capability of OpenAI o1, without distillation from superior models. rStar-Math achieves this by exercising "deep thinking" through Monte Carlo Tree Search (MCTS), where a math policy SLM performs test-time search guided by an SLM-based process reward model. rStar-Math introduces three innovations to tackle the challenges in training the two SLMs: (1) a novel code-augmented CoT data sythesis method, which performs extensive MCTS rollouts to generate step-by-step verified reasoning trajectories used to train the policy SLM; (2) a novel process reward model training method that avoids na\"ive step-level score annotation, yielding a more effective process preference model (PPM); (3) a self-evolution recipe in which the policy SLM and PPM are built from scratch and iteratively evolved to improve reasoning capabilities. Through 4 rounds of self-evolution with millions of synthesized solutions for 747k math problems, rStar-Math boosts SLMs' math reasoning to state-of-the-art levels. On the MATH benchmark, it improves Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4%, surpassing o1-preview by +4.5% and +0.9%. On the USA Math Olympiad (AIME), rStar-Math solves an average of 53.3% (8/15) of problems, ranking among the top 20% the brightest high school math students. Code and data will be available at https://github.com/microsoft/rStar.
Abstract（参考訳）: 本稿では,小型言語モデル (SLM) が優れたモデルから蒸留することなく,OpenAI o1 の算数推論能力に匹敵するか,超越するかを示すために rStar-Math を提案する。 rStar-Mathはモンテカルロ木探索 (MCTS) を通じて「深い思考」を行い、SLMベースのプロセス報酬モデルによって導かれるテスト時間探索を行う。 rStar-Mathは、2つのSLMのトレーニング課題に取り組むための3つのイノベーションを紹介している。(1) MCTSのロールアウトにより、ポリシーSLMのトレーニングに使用されるステップバイステップ検証推論軌道を生成する新しいコード拡張CoTデータシテシス法、(2) より効果的なプロセス優先モデル(PPM)を出力する新しいプロセス報酬モデルトレーニング法、(3) ポリシーSLMとPPMをスクラッチから構築し、推論能力を改善するために反復的に進化する自己進化レシピ。 rStar-Mathは747kの数学問題に対する数百万の合成ソリューションによる4ラウンドの自己進化を通じて、SLMの数学推論を最先端のレベルに引き上げる。 MATHベンチマークでは、Qwen2.5-Math-7Bを58.8%から90.0%に、Phi3-mini-3.8Bを41.4%から86.4%に改善し、o1-previewを+4.5%、+0.9%に上回った。 USA Math Olympiad (AIME)では、rStar-Mathは53.3% (8/15)の問題を解き、最も明るい高校数学の生徒の上位20%にランクインしている。コードとデータはhttps://github.com/microsoft/rStar.comで入手できる。

関連論文リスト

AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling [46.51639868437127]
AceMathは、複雑な数学問題の解法に優れたフロンティア数学モデルのスイートである。我々は報酬モデルとしてAceMath-72B-InstructとAceMath-72B-RMを開発した。 AceMath-72B-RMとAceMath-72B-RMを組み合わせると、数学推論ベンチマークの平均rm@8スコアが得られる。
論文参考訳（メタデータ） (2024-12-19T17:29:44Z)
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On [55.449818944278526]
一般的な7B言語モデル上での教師付き微調整(SFT)であるSkywork-Mathモデルシリーズを紹介する。 Skywork-Math 7Bは競争レベルのMATHベンチマークで51.2%の精度を達成した。我々は,LLMの数学推論能力を高めるために,研究用と産業用の両方で,いくつかの実践的なテイクアウトを提供する。
論文参考訳（メタデータ） (2024-07-11T09:56:51Z)
MathScale: Scaling Instruction Tuning for Mathematical Reasoning [70.89605383298331]
大規模言語モデル(LLM)は問題解決において顕著な能力を示した。しかし、数学的な問題を解く能力は依然として不十分である。高品質な数学的推論データを作成するためのシンプルでスケーラブルな方法であるMathScaleを提案する。
論文参考訳（メタデータ） (2024-03-05T11:42:59Z)
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning [98.53491178426492]
InternLM2から事前学習を継続するILMs InternLM-Mathをオープンソースとして公開する。我々は、連鎖推論、報酬モデリング、形式推論、データ拡張、コードインタプリタを、統一されたSeq2seqフォーマットで統一する。我々の事前学習モデルは、微調整なしでMiniF2Fテストセットで30.3を達成する。
論文参考訳（メタデータ） (2024-02-09T11:22:08Z)
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning [52.97768001837269]
本稿では,オープンソース言語モデルを微調整する手法を提案する。本稿では,問題のある新しい,高品質なデータセットを生成する手法とそのコードベースソリューションを提案する。このアプローチは、問題の解決にコードベースのソリューションを生成することができるモデルのファミリーであるMathCoderモデルを生成する。
論文参考訳（メタデータ） (2023-10-05T17:52:09Z)
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models [91.66694225955872]
数学的推論を専門とする微調整言語モデルであるMetaMathを提案する。具体的には、余分な知識を伴わずに複数の視点から質問を書き換えることで、数学的質問をブートストラップすることから始める。私たちは、すべてのMetaMathQAデータセット、異なるモデルサイズを持つMetaMathモデル、パブリック使用のためのトレーニングコードをリリースします。
論文参考訳（メタデータ） (2023-09-21T17:45:42Z)
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct [130.37945867605302]
本稿では,大規模言語モデル(LLM)の数学的CoT推論能力を向上させるWizardMathを提案する。注目すべきは、WizardMath-Mistral 7BがトップクラスのオープンソースLLMをはるかに上回り、データ効率が向上したことだ。予備的な調査では、卓越した数学性能を達成する上で、命令の進化とプロセスの監督が重要な役割を担っていることを強調した。
論文参考訳（メタデータ） (2023-08-18T14:23:21Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。