Fugu-MT 論文翻訳(概要): Agents Explore but Agents Ignore: LLMs Lack Environmental Curiosity

論文の概要: Agents Explore but Agents Ignore: LLMs Lack Environmental Curiosity

arxiv url: http://arxiv.org/abs/2604.17609v1
Date: Sun, 19 Apr 2026 20:49:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.599363
Title: Agents Explore but Agents Ignore: LLMs Lack Environmental Curiosity
Title（参考訳）: LLMは環境好奇心を欠く
Authors: Leon Engländer, Sophia Althammer, Ahmet Üstün, Matthias Gallé, Tom Sherborne,
Abstract要約: 現在のLSMベースのエージェントは、予期せぬ情報を反映したり、反応したりするのに苦労している。我々は、タスクのソリューションをモデルに意図的に公開するために、完全なタスクソリューションをエージェント環境に注入する。エージェントは79～81%のランでターミナルベンチでこれらの解を発見するが、それらは37～50%のケースで相互作用し、悪用する。
参考スコア（独自算出の注目度）: 12.381781997363609
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLM-based agents are assumed to integrate environmental observations into their reasoning: discovering highly relevant but unexpected information should naturally lead to a model exploiting its own discoveries. We show that this assumption is false for current LLM-based agents, which struggle to reflect or react to unexpected information. Across three benchmarks (Terminal-Bench, SWE-Bench, AppWorld), we inject complete task solutions into the agent environments to deliberately expose a task's solution to a model. While agents discover these solutions on Terminal-Bench in 79-81% of runs, they interact, or exploit, them in only 37-50% of cases. This gap is starkest in AppWorld: agents see documentation stating that a command "returns the complete solution to this task" in over 90% of attempts but exploit this in fewer than 7% of trials. We show that agents lack what we call environmental curiosity: the capability to recognize and investigate unexpected but relevant observations in response to environmental stimuli. We identify three main factors influencing environmental curiosity: available tools in the agent scaffold, test-time compute, and training data distribution. Our findings identify configurations that maximize curiosity also achieve the best performance on the unmodified benchmarks. Yet even jointly optimized agents still ignore discovered solutions in the majority of trials: current agents use the environment to fetch expected information, but not to revise their strategy or maximally exploit useful stimuli.
Abstract（参考訳）: LLMをベースとしたエージェントは、環境観測を彼らの推論に統合すると仮定されている。この仮定は、予期せぬ情報に対する反射や反応に苦慮している現在のLCMベースのエージェントに対して誤りであることを示す。 3つのベンチマーク(Terminal-Bench、SWE-Bench、AppWorld)を通して、エージェント環境に完全なタスクソリューションを注入し、タスクのソリューションをモデルに意図的に公開します。エージェントは79～81%のランでターミナルベンチでこれらの解を発見するが、それらは37～50%のケースで相互作用し、悪用する。エージェントは、コマンドが90%以上の試行で"このタスクの完全な解決策を返す"という文書を見るが、7%未満のトライアルで利用している。エージェントは環境刺激に反応して、予期せぬが関連する観察を認識・調査する能力である環境好奇心を欠いていることを示す。環境好奇心に影響を与える主な要因として,エージェントスキャフォールドで利用可能なツール,テスト時間計算,データ分散のトレーニングの3つを同定する。その結果、好奇心を最大化する構成は、修正されていないベンチマークで最高の性能を得ることができた。しかし、共同最適化されたエージェントでさえ、ほとんどの試行において発見された解決策を無視している。現在のエージェントは、期待される情報を取得するために環境を使用するが、彼らの戦略を変更したり、有用な刺激を最大限活用するためには使用しない。

論文の概要: Agents Explore but Agents Ignore: LLMs Lack Environmental Curiosity

関連論文リスト