Fugu-MT 論文翻訳(概要): SkillMaster: Toward Autonomous Skill Mastery in LLM Agents

論文の概要: SkillMaster: Toward Autonomous Skill Mastery in LLM Agents

arxiv url: http://arxiv.org/abs/2605.08693v2
Date: Tue, 12 May 2026 07:27:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 18:21:06.925609
Title: SkillMaster: Toward Autonomous Skill Mastery in LLM Agents
Title（参考訳）: SkillMaster: LLMエージェントにおける自律的なスキル習得を目指して
Authors: Min Yang, Jinghua Piao, Xu Xia, Xiaochong Lan, Jiaju Chen, Yongshun Gong, Yong Li,
Abstract要約: SkillMasterは、エージェントに新しいスキルを作り、既存のスキルを洗練させ、タスク解決中に蓄積したスキルを選択する訓練フレームワークである。第一に、私たちは、軌道インフォームドスキルレビューを通じてエージェントを訓練し、完成したエピソードの証拠に基づいて、提案、更新、保持するためのエージェントを指導する。第2に、各候補スキル編集は、関連するプローブタスクに対する対実的ユーティリティによって評価され、スキル編集決定を訓練するための直接学習信号を提供する。第3に、DualAdv-GRPOを導入し、タスク解決行動とスキル編集決定の利点を個別に推定し、タスク解決における共同トレーニングを安定化する。
参考スコア（独自算出の注目度）: 27.651128308229378
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Skills provide an effective mechanism for improving LLM agents on complex tasks, yet in existing agent frameworks, their creation, refinement, and selection are typically governed by external teachers, hand-designed rules, or auxiliary modules. As a result, skills remain external resources to be invoked, rather than capabilities that agents can develop, adapt, and internalize through experience. To endow LLM agents with autonomous skill mastery, we propose SkillMaster, a training framework that teaches agents to create new skills, refine existing skills, and select accumulated skills during task solving. This capability is achieved through three key designs. First, we train agents through trajectory-informed skill review, teaching agents to propose, update, or retain skills based on evidence from completed episodes. Second, each candidate skill edit is designed to be evaluated by its counterfactual utility on related probe tasks, providing a direct learning signal for training skill-editing decisions. Third, we introduce DualAdv-GRPO, which separately estimates advantages for task-solving actions and skill-editing decisions, stabilizing joint training across task solving and skill management. Experiments on ALFWorld and WebShop show that SkillMaster improves the overall success rate over state-of-the-art baselines by 8.8% and 9.3%, respectively, achieving the best performance among all compared methods. Further analysis reveals a marked shift in agent capability: agents trained with SkillMaster can identify skill failures, refine procedural knowledge from trajectory evidence, and transfer improvements to future tasks with limited skill-bank edits. Overall, SkillMaster moves LLM agents beyond mere skill use toward self-improving agents capable of developing, adapting, and applying their own skill repertoires.
Abstract（参考訳）: スキルは複雑なタスクにおいてLLMエージェントを改善する効果的なメカニズムを提供するが、既存のエージェントフレームワークでは、その作成、洗練、選択は通常、外部の教師、手書きのルール、補助モジュールによって管理される。結果として、エージェントが経験を通じて開発、適応、内部化できる能力ではなく、実行すべき外部リソースがスキルとして残されることになる。本研究では,LLMエージェントに自律的なスキル習得を授けるために,エージェントに新たなスキルを創造し,既存のスキルを洗練させ,タスク解決時に蓄積したスキルを選択するためのトレーニングフレームワークであるSkillMasterを提案する。この能力は3つの重要な設計によって実現される。第一に、私たちは、軌道インフォームドスキルレビューを通じてエージェントを訓練し、完成したエピソードの証拠に基づいて、提案、更新、保持するためのエージェントを指導する。第2に、各候補スキル編集は、関連するプローブタスクに対する対実的ユーティリティによって評価され、スキル編集決定を訓練するための直接学習信号を提供する。第3に、DualAdv-GRPOを導入し、タスク解決行動とスキル編集決定の利点を個別に推定し、タスク解決とスキルマネジメントをまたいだ共同トレーニングを安定化させる。 ALFWorldとWebShopの実験では、SkillMasterは最先端のベースラインに対する全体的な成功率を8.8%と9.3%改善し、比較したすべてのメソッドの中で最高のパフォーマンスを実現している。 SkillMasterで訓練されたエージェントは、スキル障害を特定し、軌道証拠から手続き的知識を洗練し、スキルバンクの編集に制限のある将来のタスクに改善を移すことができる。全体として、SkillMasterはLLMエージェントを単なるスキル使用を超えて、独自のスキルレパートリーを開発し、適応し、適用することができる自己改善エージェントへと移行する。

論文の概要: SkillMaster: Toward Autonomous Skill Mastery in LLM Agents

関連論文リスト