Fugu-MT 論文翻訳(概要): How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench

論文の概要: How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench

arxiv url: http://arxiv.org/abs/2508.20931v1
Date: Thu, 28 Aug 2025 15:57:33 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-29 18:12:02.495811
Title: How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench
Title（参考訳）: 複雑な動的環境におけるツール使用精度向上のための入力改質法 : $τ$-bench の検討
Authors: Venkatesh Mishra, Amir Saeidi, Satyam Raj, Mutsumi Nakamura, Jayanth Srinivasa, Gaowen Liu, Ali Payani, Chitta Baral,
Abstract要約: マルチターンの会話環境では、大きな言語モデル(LLM)は、一貫性のある推論とドメイン固有のポリシーへの固執にしばしば苦労する。本稿では,関連するドメインルールを付加したユーザクエリを自動的に再構成するIRMA(Input-Reformulation Multi-Agent)フレームワークを提案する。 IRMAはReAct、Function Calling、Self-Reflectionをそれぞれ16.1%、12.7%、19.1%で大きく上回っている。
参考スコア（独自算出の注目度）: 58.114899897566964
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in reasoning and planning capabilities of large language models (LLMs) have enabled their potential as autonomous agents capable of tool use in dynamic environments. However, in multi-turn conversational environments like $\tau$-bench, these agents often struggle with consistent reasoning, adherence to domain-specific policies, and extracting correct information over a long horizon of tool-calls and conversation. To capture and mitigate these failures, we conduct a comprehensive manual analysis of the common errors occurring in the conversation trajectories. We then experiment with reformulations of inputs to the tool-calling agent for improvement in agent decision making. Finally, we propose the Input-Reformulation Multi-Agent (IRMA) framework, which automatically reformulates user queries augmented with relevant domain rules and tool suggestions for the tool-calling agent to focus on. The results show that IRMA significantly outperforms ReAct, Function Calling, and Self-Reflection by 16.1%, 12.7%, and 19.1%, respectively, in overall pass^5 scores. These findings highlight the superior reliability and consistency of IRMA compared to other methods in dynamic environments.
Abstract（参考訳）: 大規模言語モデル(LLM)の推論と計画能力の最近の進歩は、動的環境におけるツール使用が可能な自律エージェントとしての可能性を可能にしている。しかし、$\tau$-benchのようなマルチターンの会話環境では、これらのエージェントは一貫性のある推論、ドメイン固有のポリシーへの固執、ツールコールと会話の長い視野で正確な情報抽出に苦労することが多い。これらの障害を捕捉・緩和するために,会話軌跡に発生する一般的な誤りを包括的に手作業で解析する。次に,エージェント意思決定の改善を目的としたツールコールエージェントへの入力の再構成実験を行った。最後に,IRMA(Input-Reformulation Multi-Agent)フレームワークを提案する。このフレームワークは,関連するドメインルールを付加したユーザクエリを自動的に再構成し,ツール呼び出しエージェントが注目するツールの提案を行う。その結果、IRMAは総パス^5スコアにおいて、ReAct、Function Calling、Self-Reflectionの16.1%、12.7%、19.1%を大きく上回った。これらの結果は、動的環境における他の方法と比較して、IRMAの信頼性と一貫性が優れていることを示している。

論文の概要: How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench

関連論文リスト