Fugu-MT 論文翻訳(概要): 3D-Belief: Embodied Belief Inference via Generative 3D World Modeling

論文の概要: 3D-Belief: Embodied Belief Inference via Generative 3D World Modeling

arxiv url: http://arxiv.org/abs/2605.11367v1
Date: Tue, 12 May 2026 00:42:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 21:48:56.496853
Title: 3D-Belief: Embodied Belief Inference via Generative 3D World Modeling
Title（参考訳）: 3D-Blief:生成的3次元世界モデリングによる身体的信念推論
Authors: Yifan Yin, Zehao Wen, Jieneng Chen, Zehan Zheng, Nanru Dai, Haojun Shi, Suyu Ye, Aydan Huang, Zheyuan Zhang, Alan Yuille, Jianwen Xie, Ayush Tewari, Tianmin Shu,
Abstract要約: 我々は,部分的な観察から明確で行動可能な3D信念を推論し,時間とともにオンラインに更新する3Dワールドモデルである3D-Beliefを提示する。従来の視覚予測モデルとは異なり、3D-Beliefは3Dで直接不確実性を示しており、具体化されたエージェントは、部分的に観察された環境について、もっともらしいシーンの完了と推論を想像することができる。
参考スコア（独自算出の注目度）: 37.75852887428672
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in visual generative models have highlighted the promise of learning generative world models. However, most existing approaches frame world modeling as novel-view synthesis or future-frame prediction, emphasizing visual realism rather than the structured uncertainty required by embodied agents acting under partial observability. In this work, we propose a different perspective: world modeling as embodied belief inference in 3D space. From this view, a world model should not merely render what may be seen, but maintain and update an agent's belief about the unobserved 3D world as new observations are acquired. We identify several key capabilities for such models, including spatially consistent scene memory, multi-hypothesis belief sampling, sequential belief updating, and semantically informed prediction of unseen regions. We instantiate these ideas in 3D-Belief, a generative 3D world model that infers explicit, actionable 3D beliefs from partial observations and updates them online over time. Unlike prior visual prediction models, 3D-Belief represents uncertainty directly in 3D, enabling embodied agents to imagine plausible scene completions and reason over partially observed environments. We evaluate 3D-Belief on 2D visual quality for scene memory and unobserved-scene imagination, object- and scene-level 3D imagination using our proposed 3D-CORE benchmark, and challenging object navigation tasks in both simulation and the real world. Experiments show that 3D-Belief improves 2D and 3D imagination quality and downstream embodied task performance compared to state-of-the-art methods.
Abstract（参考訳）: 視覚生成モデルの最近の進歩は、生成世界モデルを学ぶという約束を浮き彫りにした。しかしながら、既存のほとんどのアプローチは、新しい視点合成や将来のフレーム予測としてフレーム世界モデリングを行い、部分的に観察可能な状態に作用するエンボディエージェントが要求する構造的不確実性よりも、視覚的リアリズムを強調している。本研究では,3次元空間における信念推論を具体化する世界モデリングという,異なる視点を提案する。この視点から見ると、世界モデルは単に見えるかもしれないものを描画するだけでなく、新しい観察が得られれば、観測されていない3D世界に対するエージェントの信念を維持・更新すべきである。本研究では,空間的に一貫したシーンメモリ,複数仮説的信念サンプリング,逐次的信念更新,意味的情報による未知領域の予測など,そのようなモデルのいくつかの重要な機能を明らかにする。私たちはこれらのアイデアを、3D-Beliefという生成的3D世界モデルでインスタンス化し、部分的な観察から明確で行動可能な3D信念を推論し、時間とともにオンラインに更新します。従来の視覚予測モデルとは異なり、3D-Beliefは3Dで直接不確実性を示しており、具体化されたエージェントは、部分的に観察された環境について、もっともらしいシーンの完了と推論を想像することができる。提案した3D-COREベンチマークを用いて,シーンメモリの2次元視覚的品質と未観測シーンの想像力,オブジェクトレベルとシーンレベルの3D想像力,シミュレーションと実世界におけるオブジェクトナビゲーションの課題について評価した。実験の結果,3D-Beliefは最先端の手法と比較して,2次元および3次元の想像力品質と下流の具体的タスク性能を改善した。

論文の概要: 3D-Belief: Embodied Belief Inference via Generative 3D World Modeling

関連論文リスト