Fugu-MT 論文翻訳(概要): SAGE: Multi-Agent Self-Evolution for LLM Reasoning

論文の概要: SAGE: Multi-Agent Self-Evolution for LLM Reasoning

arxiv url: http://arxiv.org/abs/2603.15255v2
Date: Tue, 17 Mar 2026 07:31:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 13:19:44.046347
Title: SAGE: Multi-Agent Self-Evolution for LLM Reasoning
Title（参考訳）: SAGE: LLM推論のためのマルチエージェント自己進化
Authors: Yulin Peng, Xinxin Zhu, Chenxing Wei, Nianbo Zeng, Leilei Wang, Ying Tiffany He, F. Richard Yu,
Abstract要約: 検証可能な報酬を用いた強化学習は、大規模言語モデル(LLM)の推論を改善する SAGEはクローズドループフレームワークで、Challenger、Planner、Solver、Criticの4つのエージェントが、小さなシードセットのみを使用して共有LLMバックボーンから共進化する。
参考スコア（独自算出の注目度）: 34.689664313467595
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning with verifiable rewards improves reasoning in large language models (LLMs), but many methods still rely on large human-labeled datasets. While self-play reduces this dependency, it often lacks explicit planning and strong quality control, limiting stability in long-horizon multi-step reasoning. We present SAGE (Self-evolving Agents for Generalized reasoning Evolution), a closed-loop framework where four agents: Challenger, Planner, Solver, and Critic, co-evolve from a shared LLM backbone using only a small seed set. The Challenger continuously generates increasingly difficult tasks; the Planner converts each task into a structured multi-step plan; and the Solver follows the plan to produce an answer, whose correctness is determined by external verifiers. The Critic scores and filters both generated questions and plans to prevent curriculum drift and maintain training signal quality, enabling stable self-training. Across mathematics and code-generation benchmarks, SAGE delivers consistent gains across model scales, improving the Qwen-2.5-7B model by 8.9% on LiveCodeBench and 10.7% on OlympiadBench.
Abstract（参考訳）: 検証可能な報酬による強化学習は、大きな言語モデル(LLM)の推論を改善するが、多くの手法は大きな人間のラベル付きデータセットに依存している。セルフプレイは依存関係を減少させるが、明示的な計画と強力な品質管理が欠如しており、長距離多段階推論の安定性を制限していることが多い。 SAGE(Self-evolving Agents for Generalized reasoning Evolution)は4つのエージェント(Challenger, Planner, Solver, Critic)が小さなシードセットのみを用いて共有LDMバックボーンから共進化するクローズドループフレームワークである。プランナーは、各タスクを構造化された多段階計画に変換し、ソルバーは、外部の検証者によって正確性が決定される回答を生成する計画に従う。批判的なスコアとフィルターは、生成された質問と、カリキュラムのドリフトを防止し、訓練信号の品質を維持する計画の両方で、安定した自己学習を可能にする。数学とコード生成ベンチマーク全体で、SAGEはモデルスケール全体で一貫したゲインを提供し、Qwen-2.5-7BモデルはLiveCodeBenchで8.9%、OlympiadBenchで10.7%改善した。

論文の概要: SAGE: Multi-Agent Self-Evolution for LLM Reasoning

関連論文リスト