Fugu-MT 論文翻訳(概要): Multi-Agent Evolve: LLM Self-Improve through Co-evolution

論文の概要: Multi-Agent Evolve: LLM Self-Improve through Co-evolution

arxiv url: http://arxiv.org/abs/2510.23595v1
Date: Mon, 27 Oct 2025 17:58:02 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-28 15:28:15.660839
Title: Multi-Agent Evolve: LLM Self-Improve through Co-evolution
Title（参考訳）: マルチエージェント・エボリューション:共同進化によるLDM自己改善
Authors: Yixing Chen, Yiding Wang, Siqi Zhu, Haofei Yu, Tao Feng, Muhan Zhan, Mostofa Patwary, Jiaxuan You,
Abstract要約: 強化学習(RL)は、大規模言語モデル(LLM)の推論能力を高める大きな可能性を証明している。近年のSelf-Play RL法は,ゲームやGoのパラダイムの成功に触発されて,人間に注釈を付けることなくLSM推論能力を向上することを目指している。数学,推論,一般知識Q&Aなど多種多様な課題の解決において,LLMが自己発展できるフレームワークであるMulti-Agent Evolve(MAE)を提案する。
参考スコア（独自算出の注目度）: 29.495925290113636
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement Learning (RL) has demonstrated significant potential in enhancing the reasoning capabilities of large language models (LLMs). However, the success of RL for LLMs heavily relies on human-curated datasets and verifiable rewards, which limit their scalability and generality. Recent Self-Play RL methods, inspired by the success of the paradigm in games and Go, aim to enhance LLM reasoning capabilities without human-annotated data. However, their methods primarily depend on a grounded environment for feedback (e.g., a Python interpreter or a game engine); extending them to general domains remains challenging. To address these challenges, we propose Multi-Agent Evolve (MAE), a framework that enables LLMs to self-evolve in solving diverse tasks, including mathematics, reasoning, and general knowledge Q&A. The core design of MAE is based on a triplet of interacting agents (Proposer, Solver, Judge) that are instantiated from a single LLM, and applies reinforcement learning to optimize their behaviors. The Proposer generates questions, the Solver attempts solutions, and the Judge evaluates both while co-evolving. Experiments on Qwen2.5-3B-Instruct demonstrate that MAE achieves an average improvement of 4.54% on multiple benchmarks. These results highlight MAE as a scalable, data-efficient method for enhancing the general reasoning abilities of LLMs with minimal reliance on human-curated supervision.
Abstract（参考訳）: 強化学習(RL)は,大規模言語モデル(LLM)の推論能力を高める上で大きな可能性を示している。しかし、LLMのRLの成功は、そのスケーラビリティと汎用性を制限する人間計算データセットと検証可能な報酬に大きく依存している。近年のSelf-Play RL法は,ゲームやGoのパラダイムの成功に触発されて,人間に注釈を付けることなくLSM推論能力を向上することを目指している。しかし、これらのメソッドは主にフィードバック(例えばPythonインタプリタやゲームエンジン)の基盤環境に依存しており、それらを一般的なドメインに拡張することは難しいままである。これらの課題に対処するために, LLMが数学, 推論, 一般知識Q&Aを含む多様なタスクを自己開発することを可能にするフレームワークであるMulti-Agent Evolve (MAE)を提案する。 MAEの中核となる設計は、1つのLCMからインスタンス化され、それらの振る舞いを最適化するために強化学習を適用する相互作用エージェント(Proposer, Solver, Judge)の三重項に基づいている。プロポーラは質問を生成し、ソルバーは解決策を試み、裁判官は共に進化しながら評価する。 Qwen2.5-3B-Instructの実験では、MAEは複数のベンチマークで平均4.54%の改善を達成した。これらの結果から,PLMの汎用推論能力を高めるためのスケーラブルでデータ効率のよい手法としてMAEが注目されている。

論文の概要: Multi-Agent Evolve: LLM Self-Improve through Co-evolution

関連論文リスト