Fugu-MT 論文翻訳(概要): How Much LLM Does a Self-Revising Agent Actually Need?

論文の概要: How Much LLM Does a Self-Revising Agent Actually Need?

arxiv url: http://arxiv.org/abs/2604.07236v1
Date: Wed, 08 Apr 2026 16:02:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-09 17:30:51.621575
Title: How Much LLM Does a Self-Revising Agent Actually Need?
Title（参考訳）: LLMの自己修正エージェントは実際どれくらい必要か?
Authors: Seongwoo Jeong, Seonil Son,
Abstract要約: 我々は,エージェントの状態,信頼信号,保護された動作,仮説的遷移を検査可能なランタイム構造に外部化する,宣言された反射型ランタイムプロトコルを導入する。 54ゲームで4つの段階的に構造化されたエージェントを用いて,ノイズの多い協調戦闘艦上での評価を行った。
参考スコア（独自算出の注目度）: 0.14323566945483496
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent LLM-based agents often place world modeling, planning, and reflection inside a single language model loop. This can produce capable behavior, but it makes a basic scientific question difficult to answer: which part of the agent's competence actually comes from the LLM, and which part comes from explicit structure around it? We study this question not by claiming a general answer, but by making it empirically tractable. We introduce a declared reflective runtime protocol that externalizes agent state, confidence signals, guarded actions, and hypothetical transitions into inspectable runtime structure. We instantiate this protocol in a declarative runtime and evaluate it on noisy Collaborative Battleship [4] using four progressively structured agents over 54 games (18 boards $\times$ 3 seeds). The resulting decomposition isolates four components: posterior belief tracking, explicit world-model planning, symbolic in-episode reflection, and sparse LLM-based revision. Across this decomposition, explicit world-model planning improves substantially over a greedy posterior-following baseline (+24.1pp win rate, +0.017 F1). Symbolic reflection operates as a real runtime mechanism -- with prediction tracking, confidence gating, and guarded revision actions -- even though its current revision presets are not yet net-positive in aggregate. Adding conditional LLM revision at about 4.3\% of turns yields only a small and non-monotonic change: average F1 rises slightly (+0.005) while win rate drops (31$\rightarrow$29 out of 54). These results suggest a methodological contribution rather than a leaderboard claim: externalizing reflection turns otherwise latent agent behavior into inspectable runtime structure, allowing the marginal role of LLM intervention to be studied directly.
Abstract（参考訳）: 最近のLLMベースのエージェントは、単一の言語モデルループ内に世界モデリング、計画、リフレクションを配置することが多い。エージェントの能力のどの部分がLCMから、どの部分がその周りの明示的な構造から来ているのか? 我々は、この質問を一般回答を主張するのではなく、経験的に取り扱えるようにすることで研究する。我々は,エージェントの状態,信頼信号,保護された動作,仮説的遷移を検査可能なランタイム構造に外部化する,宣言された反射型ランタイムプロトコルを導入する。宣言型ランタイムでこのプロトコルをインスタンス化し,54ゲーム以上の4つの段階的に構造化されたエージェント(ボード$\times$3シード)を用いて,ノイズの多い協調戦闘艦[4]で評価する。その結果得られた分解は、後続の信念追跡、明示的な世界モデル計画、象徴的なエピソード内反射、およびスパースLSMに基づくリビジョンの4つの構成要素を分離する。この分解全体で、明示的な世界モデルプランニングは、グリージーな後続ベースライン(+24.1ppの勝利率+0.017 F1)よりも大幅に改善される。シンボリックリフレクションは、予測トラッキング、信頼性ゲーティング、ガードされたリビジョンアクションを備えた、実際のランタイムメカニズムとして動作する。条件付きLLMリビジョンを約4.3\%のターンで加えると、小さな単調な変化しか得られず、平均的なF1はわずかに上昇する(+0.005)が、勝利率は低下する(54点中31$\rightarrow$29)。これらの結果は、リフレクションの外部化は、さもなくば潜伏エージェントの動作を検査可能なランタイム構造に転換し、LCM介入の限界的な役割を直接研究することを可能にする、という、リーダーボードの主張よりも方法論的な貢献を示唆している。

論文の概要: How Much LLM Does a Self-Revising Agent Actually Need?

関連論文リスト