Fugu-MT 論文翻訳(概要): Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

論文の概要: Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

arxiv url: http://arxiv.org/abs/2602.07055v1
Date: Wed, 04 Feb 2026 19:06:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-10 20:26:24.406407
Title: Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?
Title（参考訳）: 宇宙理論:活動的探索を通して空間的信念を構築できるか?
Authors: Pingyue Zhang, Zihan Huang, Yue Wang, Jieyu Zhang, Letian Xue, Zihan Wang, Qineng Wang, Keshigeyan Chandrasegaran, Ruohan Zhang, Yejin Choi, Ranjay Krishna, Jiajun Wu, Li Fei-Fei, Manling Li,
Abstract要約: 宇宙の理論は、自己指向的かつ活発な探索を通じて情報を得るエージェントの能力として定義される。重要な革新は空間的信念の探索であり、各ステップでモデルが空間的表現を明らかにするように促す。この結果から,現在の基盤モデルでは,活発な探査において,コヒーレントで変更可能な空間的信念の維持に苦慮していることが示唆された。
参考スコア（独自算出の注目度）: 83.13508919229939
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spatial embodied intelligence requires agents to act to acquire information under partial observability. While multimodal foundation models excel at passive perception, their capacity for active, self-directed exploration remains understudied. We propose Theory of Space, defined as an agent's ability to actively acquire information through self-directed, active exploration and to construct, revise, and exploit a spatial belief from sequential, partial observations. We evaluate this through a benchmark where the goal is curiosity-driven exploration to build an accurate cognitive map. A key innovation is spatial belief probing, which prompts models to reveal their internal spatial representations at each step. Our evaluation of state-of-the-art models reveals several critical bottlenecks. First, we identify an Active-Passive Gap, where performance drops significantly when agents must autonomously gather information. Second, we find high inefficiency, as models explore unsystematically compared to program-based proxies. Through belief probing, we diagnose that while perception is an initial bottleneck, global beliefs suffer from instability that causes spatial knowledge to degrade over time. Finally, using a false belief paradigm, we uncover Belief Inertia, where agents fail to update obsolete priors with new evidence. This issue is present in text-based agents but is particularly severe in vision-based models. Our findings suggest that current foundation models struggle to maintain coherent, revisable spatial beliefs during active exploration.
Abstract（参考訳）: 空間的エンボディド・インテリジェンス(英語版)は、エージェントが部分的な可観測性の下で情報を取得することを要求する。マルチモーダル基礎モデルは受動的知覚に優れるが、その活動的かつ自己指向的な探索能力はいまだ検討されていない。本研究では,自己指向的かつ活発な探索を通じて情報を積極的に獲得し,逐次的部分的な観察から空間的信念を構築し,修正し,活用するエージェントの能力として定義された空間理論を提案する。正確な認知マップを構築するために好奇心を駆使した探索を目標とするベンチマークを通じてこれを評価する。重要な革新は空間的信念の探索であり、各ステップでモデルが空間的表現を明らかにするように促す。我々の最先端モデルに対する評価は、いくつかの重要なボトルネックを明らかにしている。まず、エージェントが自律的に情報を集める必要がある場合、パフォーマンスが著しく低下するアクティブ・パッシブ・ギャップを特定する。第二に、モデルがプログラムベースのプロキシと比較して非体系的に探索するときに、高い非効率性を見出す。信念探索を通じて、認識は初期のボトルネックであるが、世界的信念は空間的知識が経時的に劣化する不安定さに悩まされていることを診断する。最後に、誤った信念のパラダイムを用いて、エージェントが古い過去の情報を新しい証拠で更新するのに失敗するBelief Inertiaを明らかにする。この問題はテキストベースのエージェントに存在するが、特に視覚ベースのモデルでは深刻である。この結果から,現在の基盤モデルでは,活発な探査において,コヒーレントで変更可能な空間的信念の維持に苦慮していることが示唆された。

論文の概要: Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

関連論文リスト