Fugu-MT 論文翻訳(概要): Gecko: A Simulation Environment with Stateful Feedback for Refining Agent Tool Calls

論文の概要: Gecko: A Simulation Environment with Stateful Feedback for Refining Agent Tool Calls

arxiv url: http://arxiv.org/abs/2602.19218v1
Date: Sun, 22 Feb 2026 15:02:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-24 17:42:02.537313
Title: Gecko: A Simulation Environment with Stateful Feedback for Refining Agent Tool Calls
Title（参考訳）: Gecko: エージェントツールコールのステートフルフィードバックを備えたシミュレーション環境
Authors: Zeyu Zhang, Guohao Li, Zhenchang Xing, Alexandros Apostolopoulos, Yu Lin Lee, Liang Zheng,
Abstract要約: 本稿では,ルールとLLMを組み合わせてツール応答をシミュレートする総合環境であるGeckoを紹介する。 GATS は GPT-4o, GPT-5, Gemini-3.0-pro など様々な LLM のツールコール性能を一貫して改善している。
参考スコア（独自算出の注目度）: 56.407063247662336
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The ability to use tools is fundamental for large language model (LLM) agents. Given a task, existing systems use LLMs to plan and generate tool calls, which are executed by real-world tools to complete the task. However, tool calls are prone to errors because they are derived merely from LLM intrinsic capabilities. What is more, while it is useful to let LLMs iteratively refine the tool-call sequence using execution results from real tools, this process can be expensive and lead to unsafe results. To improve LLM tool calls and address issues caused by using real tools for refinement, we introduce Gecko, a comprehensive environment that simulates tool responses using a combination of rules and LLMs. Specifically, Gecko checks the validity of tool calls including input arguments and tool names, synthesizes reasonable responses that adhere to the output schema, and assesses whether all task objectives have been achieved. These three types of feedback provided by Gecko allow LLMs to refine their tool calls, forming a simple yet effective test-time scaling method named GATS. On BFCLv3 and $τ^2$-bench, GATS consistently improves the tool calling performance of various LLMs including GPT-4o, GPT-5, and Gemini-3.0-pro. We further discuss working mechanisms of our method and share future possibilities.
Abstract（参考訳）: ツールを使用する能力は、大規模言語モデル(LLM)エージェントの基本である。タスクが与えられた場合、既存のシステムは LLM を使用してツールコールを計画し、生成する。しかし、ツールコールはLLM固有の能力から派生しているため、エラーを起こしやすい。さらに、LLMが実際のツールの実行結果を使って反復的にツールコールシーケンスを洗練させるのが有用だが、このプロセスは高価であり、安全でない結果につながる可能性がある。 LLMツールコールの改善と改善のために,ルールとLLMを組み合わせてツール応答をシミュレートする総合環境であるGeckoを紹介した。具体的には、入力引数やツール名を含むツールコールの有効性を確認し、出力スキーマに準拠する合理的な応答を合成し、すべてのタスク目標が達成されたかどうかを評価する。 Geckoが提供した3種類のフィードバックにより、LLMはツールコールを洗練でき、GATSと呼ばれるシンプルで効果的なテスト時間スケーリングメソッドを構築できる。 BFCLv3 と $τ^2$-bench では、GATS は GPT-4o, GPT-5, Gemini-3.0-pro など様々な LLM の呼び出し性能を一貫して改善している。さらに,本手法の動作機構について考察し,今後の可能性について述べる。

論文の概要: Gecko: A Simulation Environment with Stateful Feedback for Refining Agent Tool Calls

関連論文リスト