Fugu-MT 論文翻訳(概要): Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs

論文の概要: Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs

arxiv url: http://arxiv.org/abs/2602.14697v1
Date: Mon, 16 Feb 2026 12:34:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-17 16:22:50.403049
Title: Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs
Title（参考訳）: LLMの強化学習を支援する進化的システムプロンプト学習
Authors: Lunjun Zhang, Ryan Chen, Bradly C. Stadie,
Abstract要約: 本稿では,モデルコンテキストとモデル重みを協調的に改善する手法である進化的システムプロンプト学習(E-SPL)を提案する。各RLイテレーションでは、E-SPLは複数のシステムプロンプトを選択し、並列にロールアウトを実行する。 RLの更新は、各システムプロンプトで条件付けられたモデルウェイトに適用される。
参考スコア（独自算出の注目度）: 3.917120254079574
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Building agentic systems that can autonomously self-improve from experience is a longstanding goal of AI. Large language models (LLMs) today primarily self-improve via two mechanisms: self-reflection for context updates, and reinforcement learning (RL) for weight updates. In this work, we propose Evolutionary System Prompt Learning (E-SPL), a method for jointly improving model contexts and model weights. In each RL iteration, E-SPL selects multiple system prompts and runs rollouts with each in parallel. It applies RL updates to model weights conditioned on each system prompt, and evolutionary updates to the system prompt population via LLM-driven mutation and crossover. Each system prompt has a TrueSkill rating for evolutionary selection, updated from relative performance within each RL iteration batch. E-SPL encourages a natural division between declarative knowledge encoded in prompts and procedural knowledge encoded in weights, resulting in improved performance across reasoning and agentic tasks. For instance, in an easy-to-hard (AIME $\rightarrow$ BeyondAIME) generalization setting, E-SPL improves RL success rate from 38.8% $\rightarrow$ 45.1% while also outperforming reflective prompt evolution (40.0%). Overall, our results show that coupling reinforcement learning with system prompt evolution yields consistent gains in sample efficiency and generalization. Code: https://github.com/LunjunZhang/E-SPL
Abstract（参考訳）: 経験から自律的に自己改善できるエージェントシステムを構築することは、AIの長年の目標である。大規模言語モデル(LLM)は、主に、コンテキスト更新のための自己回帰と、重み更新のための強化学習(RL)という2つのメカニズムによって自己改善されている。本研究では,モデルコンテキストとモデル重みを協調的に改善する手法である進化的システムプロンプト学習(E-SPL)を提案する。各RLイテレーションでは、E-SPLは複数のシステムプロンプトを選択し、並列にロールアウトを実行する。 RLの更新は、各システムプロンプトで条件付けられたモデルウェイトに適用される。各システムプロンプトは、進化的選択のためのTrueSkill評価を持ち、各RLイテレーションバッチ内の相対的なパフォーマンスから更新される。 E-SPLは、プロンプトにエンコードされた宣言的知識と重みにエンコードされた手続き的知識とを自然に分割することを奨励し、推論とエージェント的タスクにまたがるパフォーマンスを向上させる。例えば、簡単な (AIME $\rightarrow$ BeyondAIME) 一般化設定では、E-SPL は RL の成功率を 38.8% $\rightarrow$ 45.1% から改善し、反射的即興進化 (40.0%) を上回っている。以上の結果から,システム進化の促進と強化学習の結合は,サンプル効率と一般化において一貫した利益をもたらすことが示された。コード:https://github.com/LunjunZhang/E-SPL

論文の概要: Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs

関連論文リスト