Fugu-MT 論文翻訳(概要): Automated Skill Discovery for Language Agents through Exploration and Iterative Feedback

論文の概要: Automated Skill Discovery for Language Agents through Exploration and Iterative Feedback

arxiv url: http://arxiv.org/abs/2506.04287v1
Date: Wed, 04 Jun 2025 10:04:21 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-06 21:53:49.327632
Title: Automated Skill Discovery for Language Agents through Exploration and Iterative Feedback
Title（参考訳）: 探索と反復フィードバックによる言語エージェントの自動スキル発見
Authors: Yongjin Yang, Sinjae Kang, Juyong Lee, Dongjun Lee, Se-Young Yun, Kimin Lee,
Abstract要約: 大規模言語モデル(LLM)のための自動スキル発見フレームワークを提案する。我々は,探索エージェント(Alice)を用いて,対象エージェント(Bob)を訓練し,環境に不可欠なスキルを学習することで,探索ファースト戦略を採用する。 WebshopとCrafterの実験では、ExIFが有意義なスキルを効果的に発見し、訓練されたエージェントの能力を反復的に拡張する能力を示している。
参考スコア（独自算出の注目度）: 44.66973406051031
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training large language model (LLM) agents to acquire necessary skills and perform diverse tasks within an environment is gaining interest as a means to enable open-endedness. However, creating the training dataset for their skill acquisition faces several challenges. Manual trajectory collection requires significant human effort. Another approach, where LLMs directly propose tasks to learn, is often invalid, as the LLMs lack knowledge of which tasks are actually feasible. Moreover, the generated data may not provide a meaningful learning signal, as agents often already perform well on the proposed tasks. To address this, we propose a novel automatic skill discovery framework EXIF for LLM-powered agents, designed to improve the feasibility of generated target behaviors while accounting for the agents' capabilities. Our method adopts an exploration-first strategy by employing an exploration agent (Alice) to train the target agent (Bob) to learn essential skills in the environment. Specifically, Alice first interacts with the environment to retrospectively generate a feasible, environment-grounded skill dataset, which is then used to train Bob. Crucially, we incorporate an iterative feedback loop, where Alice evaluates Bob's performance to identify areas for improvement. This feedback then guides Alice's next round of exploration, forming a closed-loop data generation process. Experiments on Webshop and Crafter demonstrate EXIF's ability to effectively discover meaningful skills and iteratively expand the capabilities of the trained agent without any human intervention, achieving substantial performance improvements. Interestingly, we observe that setting Alice to the same model as Bob also notably improves performance, demonstrating EXIF's potential for building a self-evolving system.
Abstract（参考訳）: 大規模言語モデル(LLM)エージェントを訓練し、必要なスキルを習得し、環境内で多様なタスクを遂行する。しかし、スキル獲得のためのトレーニングデータセットを作成することは、いくつかの課題に直面している。手動の軌跡収集にはかなりの人的努力が必要である。 LLMが直接学習するタスクを提案する別のアプローチは、LLMがどのタスクが実際に実行可能であるかの知識を欠いているため、しばしば無効である。さらに、生成されたデータは、エージェントが提案されたタスクで既にうまく機能しているため、意味のある学習信号を提供しない可能性がある。そこで本研究では, エージェントの能力を考慮した新たな自動スキル発見フレームワークEXIFを提案する。本手法では,探索エージェント (Alice) を用いて対象エージェント (Bob) を訓練し,環境に不可欠なスキルを学習する。具体的には、Aliceはまず環境と対話して、現実的な、環境に根ざしたスキルデータセットを生成し、それをBobのトレーニングに使用する。ここでは、Alice氏がBobのパフォーマンスを評価し、改善すべき領域を特定する。このフィードバックはAlice氏の次の調査ラウンドをガイドし、クローズドループデータ生成プロセスを形成する。 WebshopとCrafterの実験では、ExIFが有意義なスキルを効果的に発見し、人間の介入なしに訓練されたエージェントの能力を反復的に拡張し、大幅なパフォーマンス改善を達成できることが示されている。興味深いことに、AliceをBobと同じモデルに設定することでパフォーマンスが向上し、EXIFが自己進化システムを構築する可能性を示している。

論文の概要: Automated Skill Discovery for Language Agents through Exploration and Iterative Feedback

関連論文リスト