Fugu-MT 論文翻訳(概要): SkillOS: Learning Skill Curation for Self-Evolving Agents

論文の概要: SkillOS: Learning Skill Curation for Self-Evolving Agents

arxiv url: http://arxiv.org/abs/2605.06614v1
Date: Thu, 07 May 2026 17:31:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:12.045186
Title: SkillOS: Learning Skill Curation for Self-Evolving Agents
Title（参考訳）: SkillOS: 自己進化型エージェントのためのスキルキュレーション
Authors: Siru Ouyang, Jun Yan, Yanfei Chen, Rujun Han, Zifeng Wang, Bhavana Dalvi Mishra, Rui Meng, Chun-Liang Li, Yizhu Jiao, Kaiwen Zha, Maohao Shen, Vishy Tirumalashetty, George Lee, Jiawei Han, Tomas Pfister, Chen-Yu Lee,
Abstract要約: 本稿では,自己進化エージェントのスキルキュレーションを学習するための,経験駆動型RLトレーニングレシピであるSkillOSを提案する。 SkillOSは、凍結したエージェントエグゼキュータとトレーニング可能なスキルキュレーターを組み合わせて、蓄積したエクスペリエンスから外部SkillRepoを更新する。 SkillOSは、メモリフリーと強力なメモリベースラインを、有効性と効率の両方で一貫して上回っていることを示す。
参考スコア（独自算出の注目度）: 67.94374107466957
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key bottleneck. Existing approaches either rely on manual skill curation, prescribe heuristic skill operations, or train for short-horizon skill operations. However, they still struggle to learn complex long-term curation policies from indirect and delayed feedback. To tackle this challenge, we propose SkillOS, an experience-driven RL training recipe for learning skill curation in self-evolving agents. SkillOS pairs a frozen agent executor that retrieves and applies skills with a trainable skill curator that updates an external SkillRepo from accumulated experience. To provide learning signals for curation, we design composite rewards and train on grouped task streams based on skill-relevant task dependencies, where earlier trajectories update the SkillRepo, and later related tasks evaluate these updates. Across multi-turn agentic tasks and single-turn reasoning tasks, SkillOS consistently outperforms memory-free and strong memory-based baselines in both effectiveness and efficiency, with the learned skill curator generalizing across different executor backbones and task domains. Further analyses show that the learned curator produces more targeted skill use, while the skills in SkillRepo evolve into more richly structured Markdown files that encode higher-level meta-skills over time.
Abstract（参考訳）: LLMベースのエージェントは、ストリーミングタスクを処理するためにますますデプロイされているが、過去のインタラクションから学べない、ワンオフの問題解決者のままであることが多い。経験から蒸留した再利用可能なスキルは、高品質なスキルキュレーションが鍵となる自己進化の自然な基盤を提供する。既存のアプローチは、手動のスキルキュレーション、ヒューリスティックなスキル操作の処方、短期水平スキル操作の訓練のいずれかに依存している。しかし、彼らは間接的および遅延したフィードバックから複雑な長期キュレーションポリシーを学ぶのに依然として苦労している。この課題に対処するために、自己進化エージェントのスキルキュレーションを学習するための経験駆動型RLトレーニングレシピであるSkillOSを提案する。 SkillOSは、凍結したエージェントエグゼキュータとトレーニング可能なスキルキュレーターを組み合わせて、蓄積したエクスペリエンスから外部SkillRepoを更新する。キュレーションのための学習信号を提供するため,我々は,スキル関連タスク依存に基づくグループ化されたタスクストリームを設計し,それらの更新を評価した上で,以前のトラジェクトリがSkillRepoを更新する。マルチターンエージェントタスクとシングルターン推論タスク全体で、SkillOSは、さまざまなエグゼクタバックボーンとタスクドメインにまたがる学習スキルキュレーターによって、メモリフリーと強力なメモリベースラインを、効率と効率の両方で一貫して上回っている。さらに分析したところ、学習したキュレーターはよりターゲットとなるスキル使用を生成する一方で、SkillRepoのスキルはよりリッチな構造化されたMarkdownファイルに進化し、高レベルのメタスキルを時間とともにエンコードすることがわかった。

論文の概要: SkillOS: Learning Skill Curation for Self-Evolving Agents

関連論文リスト