Fugu-MT 論文翻訳(概要): Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents

論文の概要: Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents

arxiv url: http://arxiv.org/abs/2604.08369v1
Date: Thu, 09 Apr 2026 15:34:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:05.999175
Title: Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents
Title（参考訳）: LLMエージェントのフリーアダプティブ・コンピュテート信号としてのロールアウト・アクション・アグリーメント
Authors: Khushal Sethi,
Abstract要約: 大規模言語モデル(LLM)エージェントの信頼性向上のための強力な手法として,推論時計算スケーリングが登場している。我々は、ロールアウト間動作合意を計測することで、エージェントタイムステップ間でLLMコールを適応的に割り当てる訓練不要のコントローラTrACEを紹介する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Inference-time compute scaling has emerged as a powerful technique for improving the reliability of large language model (LLM) agents, but existing methods apply compute uniformly: every decision step receives the same budget regardless of its difficulty. We introduce TrACE (Trajectorical Adaptive Compute via agrEement), a training-free controller that allocates LLM calls adaptively across agent timesteps by measuring inter-rollout action agreement. At each step, TrACE samples a small set of candidate next actions and measures how consistently the model commits to the same action. High agreement signals an easy decision; the controller commits immediately. Low agreement signals uncertainty; the controller samples additional rollouts up to a configurable cap before committing to the plurality action. No learned components, no external verifier, and no human labels are required. We evaluate TrACE against greedy decoding and fixed-budget self-consistency (SC-4, SC-8) on two benchmarks spanning single-step reasoning (GSM8K, n=50) and multi-step household navigation (MiniHouse, n=30), using a Qwen 2.5 3B Instruct model running on CPU. TrACE-4 matches SC-4 accuracy while using 33% fewer LLM calls on GSM8K and 39% fewer on MiniHouse. TrACE-8 matches SC-8 accuracy with 55% fewer calls on GSM8K and 65% fewer on MiniHouse. We further show that inter-rollout agreement is a reliable signal of step-level success, validating the core hypothesis that the model's own output consistency encodes difficulty information that can be exploited without training. TrACE is the first training-free, per-timestep adaptive-compute controller for LLM agents to be evaluated on multi-step sequential decision tasks.
Abstract（参考訳）: 推論時間計算のスケーリングは,大規模言語モデル(LLM)エージェントの信頼性向上のための強力な手法として登場したが,既存の手法が一様に適用されている。 AgrEementによるTrACE (Trajectorical Adaptive Compute via agrEement) は、ロールアウト間動作合意を計測することで、エージェントのタイムステップ間でLSMコールを適応的に割り当てる訓練不要のコントローラである。各ステップでTrACEは、候補となる次のアクションの小さなセットをサンプリングし、モデルが同じアクションにいかに一貫してコミットするかを測定する。高い合意は簡単な決定であり、コントローラはすぐにコミットする。コントローラは、複数のアクションにコミットする前に、設定可能な上限まで追加のロールアウトをサンプリングする。学習したコンポーネントも外部検証器も人間ラベルも必要ない。我々は,CPU上で動作するQwen 2.5 3Bインストラクトモデルを用いて,シングルステップ推論(GSM8K, n=50)とマルチステップホームナビゲーション(MiniHouse, n=30)にまたがる2つのベンチマークにおいて,グリージーデコーディングと固定予算自己整合性(SC-4, SC-8)に対するTrACEの評価を行った。 TrACE-4 は SC-4 の精度と一致し、GSM8K での LLM 呼び出しは 33% 、MiniHouse では 39% 削減された。 TrACE-8はSC-8の精度と一致し、GSM8Kの呼び出しは55%、MiniHouseの呼び出しは65%減少した。さらに、ロールアウト合意がステップレベルの成功の確実なシグナルであることを示し、モデルの出力一貫性がトレーニングなしで活用できる難易度情報を符号化するというコア仮説を検証する。 TrACEは、マルチステップシーケンシャル決定タスクで評価されるLDMエージェントのための、トレーニング不要で、時間単位の適応計算コントローラである。

論文の概要: Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents

関連論文リスト