Fugu-MT 論文翻訳(概要): SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks

論文の概要: SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks

arxiv url: http://arxiv.org/abs/2604.20087v1
Date: Wed, 22 Apr 2026 01:07:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-23 15:36:10.898324
Title: SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks
Title（参考訳）: SkillLearnBench: 実世界のタスクにおけるエージェントスキル生成のための連続学習手法のベンチマーク
Authors: Shanshan Zhong, Yi Lu, Jingjie Ning, Yibing Wan, Lihan Feng, Yuyi Ao, Leonardo F. R. Ribeiro, Markus Dreyer, Sean Ammirati, Chenyan Xiong,
Abstract要約: SkillLearnBenchは連続的なスキル学習手法を評価するための最初のベンチマークである。継続的学習は、明確で再利用可能なタスクを改善するが、オープンなタスクでは苦労する。我々のデータとコードはhttps://github.com/cscmu/SkillLearnBench.comでオープンソース化されています。
参考スコア（独自算出の注目度）: 32.195367070060904
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Skills have become the de facto way to enable LLM agents to perform complex real-world tasks with customized instructions, workflows, and tools, but how to learn them automatically and effectively remains unclear. We introduce SkillLearnBench, the first benchmark for evaluating continual skill learning methods, comprising 20 verified, skill-dependent tasks across 15 sub-domains derived from a real-world skill taxonomy , evaluated at three levels: skill quality, execution trajectory, and task outcome. Using this benchmark, we evaluate recent continual learning techniques, those leveraging one-shot, self/teacher feedback, and skill creator to generate skills from agent experiences. We find that all continual learning methods improve over the no-skill baseline, yet consistent gains remain elusive: no method leads across all tasks and LLMs, and scaling to stronger LLMs does not reliably help. Continual learning improves tasks with clear, reusable workflows but struggles on open-ended tasks, and using stronger LLM backbones does not consistently produce better skills. Our analysis also revealed that multiple iterations in continual learning facilitate genuine improvement via external feedback, whereas self-feedback alone induces recursive drift. Our data and code are open-source at https://github.com/cxcscmu/SkillLearnBench to enable further studies of automatic skill generation and continual learning techniques.
Abstract（参考訳）: スキルは、LLMエージェントがカスタマイズされた命令、ワークフロー、ツールで複雑な現実世界のタスクを実行できる事実上の方法となっている。 SkillLearnBenchは、実世界のスキル分類から派生した15のサブドメインにまたがる、20の検証済みスキル依存タスクから構成され、スキル品質、実行軌跡、タスク結果の3つのレベルで評価される、継続的スキル学習手法を評価するための最初のベンチマークである。このベンチマークを用いて、エージェント体験からスキルを生み出すために、最近の連続学習技術、ワンショット、自己/教師のフィードバック、スキルクリエータを活用して評価する。すべての手法が全てのタスクやLLMをリードするわけではなく、より強力なLLMへのスケーリングは確実には役に立たない。継続的学習は、明確で再利用可能なワークフローでタスクを改善するが、オープンなタスクに苦労する。分析の結果,反復学習は外部フィードバックによる真の改善を促進するが,自己フィードバックだけで再帰的ドリフトが引き起こされることがわかった。我々のデータとコードはhttps://github.com/cxcscmu/SkillLearnBenchでオープンソース化されています。

論文の概要: SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks

関連論文リスト