Fugu-MT 論文翻訳(概要): OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

論文の概要: OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

arxiv url: http://arxiv.org/abs/2605.11169v1
Date: Mon, 11 May 2026 19:28:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 21:48:56.385747
Title: OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents
Title（参考訳）: OLIVIA: LLMリアクトエージェントにおける推論時行動適応による意思決定のためのオンライン学習
Authors: Sheldon Yu, Junda Wu, Xintong Li, Nikki Lijing Kuang, Sizhe Zhou, Tong Yu, Jiawei Han, Jingbo Shang, Julian McAuley,
Abstract要約: 大規模言語モデルエージェントは、シーケンシャルな意思決定タスクを解決するために、推論、行動選択、観察をインターリーブする。 LLMエージェントの既存の推論時間適応法は、主にプロンプトや検索に依存している。提案するOLIVIAは,ReAct型エージェントのための推論時行動適応フレームワークである。
参考スコア（独自算出の注目度）: 74.20327254615854
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model agents interleave reasoning, action selection, and observation to solve sequential decision-making tasks. In deployed settings where agents repeatedly handle related multi-step tasks, small action-selection errors can accumulate into wasted tool calls, latency, and reduced reliability. Despite this need for deployment-time improvement, existing inference-time adaptation methods for LLM agents mainly rely on prompting or retrieval, which influence behavior indirectly through context manipulation. For ReAct-style agents, such approaches do not expose an explicit decision layer that can score candidate actions, represent uncertainty, or be updated online from action-level feedback. As a result, they provide limited support for trackable, fine-grained, and uncertainty-aware adaptation during deployment. We propose OLIVIA, an inference-time action adaptation framework for ReAct-style agents. OLIVIA models the LLM's final action-selection layer as a contextual linear bandit over candidate actions, with frozen hidden states as decision contexts. This choice is particularly suitable for deployment because it adapts behavior directly at the action-selection interface, preserves the underlying reasoning process, and provides explicit uncertainty estimates and lightweight online updates from action-level feedback. With upper-confidence-bound exploration, OLIVIA improves the policy sample-efficiently with minimal computational overhead. We instantiate OLIVIA on four benchmarks and show that it consistently improves task performance over static ReAct and prompt-based inference-time baselines. Our results suggest that explicit online decision layers provide an effective alternative to purely prompt- or retrieval-based adaptation for LLM agents during deployment.
Abstract（参考訳）: 大規模言語モデルエージェントは、シーケンシャルな意思決定タスクを解決するために、推論、行動選択、観察をインターリーブする。エージェントが関連するマルチステップタスクを繰り返し処理するデプロイ設定では、小さなアクション選択エラーが無駄なツールコールやレイテンシ、信頼性の低下に蓄積される。デプロイメント時間の改善の必要性にもかかわらず、既存のLLMエージェントの推論時適応手法は主に、コンテキスト操作を通じて間接的に振る舞いに影響を与えるプロンプトや検索に依存している。 ReActスタイルのエージェントでは、このようなアプローチは明確な決定層を公開していない。結果として、デプロイメント中の追跡可能、きめ細かな、不確実性を認識した適応に対して、限定的なサポートを提供する。提案するOLIVIAは,ReAct型エージェントのための推論時行動適応フレームワークである。 OLIVIAは、LLMの最終アクション選択層を、決定コンテキストとして凍結された隠れ状態を持つ、候補アクションに対するコンテキスト線形帯域としてモデル化する。この選択は、アクション選択インターフェースで振舞いを直接適応し、基礎となる推論プロセスを保持し、明確な不確実性推定とアクションレベルのフィードバックからの軽量なオンライン更新を提供するため、特にデプロイメントに適している。 OLIVIAは、高信頼な探索により、最小の計算オーバーヘッドで効率よく政策を改良する。 4つのベンチマークでOLIVIAをインスタンス化し、静的ReActおよびプロンプトベースの推論時間ベースラインよりもタスクパフォーマンスを継続的に改善することを示す。以上の結果から, 明示的なオンライン決定層は, LLMエージェントに対する純粋にプロンプトまたは検索に基づく適応の代替となることが示唆された。

論文の概要: OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

関連論文リスト