Fugu-MT 論文翻訳(概要): Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks

論文の概要: Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks

arxiv url: http://arxiv.org/abs/2508.13143v1
Date: Mon, 18 Aug 2025 17:55:22 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-19 14:49:11.522523
Title: Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks
Title（参考訳）: 自律エージェントの探索:タスクの完了時に失敗する理由
Authors: Ruofan Lu, Yichen Li, Yintong Huo,
Abstract要約: 我々は、自律エージェントを厳格に評価するために設計された34のプログラム可能なタスクのベンチマークを示す。 LLMバックボーンと組み合わせた3つの人気のあるオープンソースエージェントフレームワークを評価し,タスク完了率約50%を観察した。我々は,障害の原因を3段階に分類し,計画上のエラー,タスク実行の問題,誤った応答生成を強調する。
参考スコア（独自算出の注目度）: 8.218266805768687
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autonomous agent systems powered by Large Language Models (LLMs) have demonstrated promising capabilities in automating complex tasks. However, current evaluations largely rely on success rates without systematically analyzing the interactions, communication mechanisms, and failure causes within these systems. To bridge this gap, we present a benchmark of 34 representative programmable tasks designed to rigorously assess autonomous agents. Using this benchmark, we evaluate three popular open-source agent frameworks combined with two LLM backbones, observing a task completion rate of approximately 50%. Through in-depth failure analysis, we develop a three-tier taxonomy of failure causes aligned with task phases, highlighting planning errors, task execution issues, and incorrect response generation. Based on these insights, we propose actionable improvements to enhance agent planning and self-diagnosis capabilities. Our failure taxonomy, together with mitigation advice, provides an empirical foundation for developing more robust and effective autonomous agent systems in the future.
Abstract（参考訳）: LLM(Large Language Models)を利用した自律エージェントシステムは、複雑なタスクを自動化する上で有望な能力を実証している。しかし、現在の評価は、システム内の相互作用、通信メカニズム、障害原因を体系的に分析することなく、成功率に大きく依存している。このギャップを埋めるために、自律エージェントを厳格に評価するために設計された34のプログラム可能なタスクのベンチマークを示す。このベンチマークを用いて、2つのLCMバックボーンを組み合わせた3つの人気のあるオープンソースエージェントフレームワークを評価し、タスク完了率約50%を観察した。詳細な失敗分析を通じて,タスクフェーズに沿った障害原因の3段階分類を開発し,計画エラー,タスク実行問題,誤った応答生成を強調する。これらの知見に基づき,エージェント計画と自己診断能力を高めるための実用的な改善を提案する。我々の失敗分類学は、緩和アドバイスとともに、将来より堅牢で効果的な自律エージェントシステムを開発するための実証的な基盤を提供する。

論文の概要: Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks

関連論文リスト