Fugu-MT 論文翻訳(概要): Asymmetric Actor-Critic for Multi-turn LLM Agents

論文の概要: Asymmetric Actor-Critic for Multi-turn LLM Agents

arxiv url: http://arxiv.org/abs/2604.00304v1
Date: Tue, 31 Mar 2026 22:56:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-02 16:44:31.758191
Title: Asymmetric Actor-Critic for Multi-turn LLM Agents
Title（参考訳）: マルチターンLDMエージェントのための非対称アクター臨界
Authors: Shuli Jiang, Zhaoyang Zhang, Yi Zhang, Shuo Yang, Wei Xia, Stefano Soatto,
Abstract要約: 信頼性のある対話エージェントのための非対称アクター批判フレームワークを提案する。強力なプロプライエタリなLLMがアクターとして機能し、小さなオープンソース批評家がランタイムの監視を提供する。提案手法は,強力な単一エージェントベースラインよりも信頼性とタスク成功を著しく向上させることを示す。
参考スコア（独自算出の注目度）: 50.245019205783855
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large language models (LLMs) exhibit strong reasoning and conversational abilities, but ensuring reliable behavior in multi-turn interactions remains challenging. In many real-world applications, agents must succeed in one-shot settings where retries are impossible. Existing approaches either rely on reflection or post-hoc evaluation, which require additional attempts, or assume fully trainable models that cannot leverage proprietary LLMs. We propose an asymmetric actor-critic framework for reliable conversational agents. A powerful proprietary LLM acts as the actor, while a smaller open-source critic provides runtime supervision, monitoring the actor's actions and intervening within the same interaction trajectory. Unlike training-based actor-critic methods, our framework supervises a fixed actor operating in open-ended conversational environments. The design leverages a generation-verification asymmetry: while high-quality generation requires large models, effective oversight can often be achieved by smaller ones. We further introduce a data generation pipeline that produces supervision signals for critic fine-tuning without modifying the actor. Experiments on $τ$-bench and UserBench show that our approach significantly improves reliability and task success over strong single-agent baselines. Moreover, lightweight open-source critics rival or surpass larger proprietary models in the critic role, and critic fine-tuning yields additional gains over several state-of-the-art methods.
Abstract（参考訳）: 大規模言語モデル(LLM)は、強い推論能力と会話能力を示すが、マルチターン相互作用における信頼性を保証することは依然として困難である。多くの現実世界のアプリケーションでは、エージェントは再試行が不可能なワンショット設定で成功しなければなりません。既存のアプローチはリフレクションやポストホック評価に依存しており、追加の試行が必要になる。信頼性のある対話エージェントのための非対称アクター批判フレームワークを提案する。強力なプロプライエタリなLLMがアクターとして機能し、小さなオープンソース批評家が実行時の監視を提供し、アクターの動作を監視し、同じ相互作用軌道内で介入する。トレーニングベースのアクター批判手法とは異なり、我々のフレームワークはオープンエンドの会話環境で動作する固定アクターを監督する。高品質な生成には大きなモデルが必要であるが、効果的な監視はより小さなモデルによって達成されることが多い。さらに、アクターを変更することなく、批判的な微調整のための監視信号を生成するデータ生成パイプラインを導入する。 τ$-benchとUserBenchの実験では、強力な単一エージェントベースラインよりも信頼性とタスク成功が大幅に向上していることが示されています。さらに、ライトウェイトなオープンソース評論家は、批評家の役割においてより大きなプロプライエタリなモデルに匹敵するか、または超越している。

論文の概要: Asymmetric Actor-Critic for Multi-turn LLM Agents

関連論文リスト