Fugu-MT 論文翻訳(概要): A 3D Isovist World Model -- Revealing a City's Unseen Geometry and Its Emergent Cross-City Signature

論文の概要: A 3D Isovist World Model -- Revealing a City's Unseen Geometry and Its Emergent Cross-City Signature

arxiv url: http://arxiv.org/abs/2606.03609v2
Date: Wed, 03 Jun 2026 07:29:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-04 17:40:41.635071
Title: A 3D Isovist World Model -- Revealing a City's Unseen Geometry and Its Emergent Cross-City Signature
Title（参考訳）: 3次元アイソビスト世界モデル -都市の見えざる幾何学とその創発的都市横断符号-
Authors: Xuhui Lin, Stephen Law, Nanjiang Chen, Kunyao Li, Tao Yang,
Abstract要約: 都市を航行するエージェントは、周囲がどう変化するかを予測する世界モデルに頼っている。我々は,過去のアイソビストと運動行動の短い歴史から,次のアイソビストを予測する具体的世界モデルを導入する。マンハッタンとパリで訓練された1つの都市ブラインドモデルは、都市横断的な空間的シグネチャを発達させる。
参考スコア（独自算出の注目度）: 3.2214289760227377
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Embodied agents that navigate cities rely on world models that predict how their surroundings will change as they move. But for navigation, what matters is not what the buildings look like; it is where the agent can go. Most world models nonetheless predict appearance, learning how a scene looks rather than the space an agent can move through. Those that do target geometry, such as bird's-eye-view occupancy grids, flatten the three-dimensional environment onto a ground plane, discarding the above-ground and multi-level structure that shapes real navigation. What is missing is a predictive target that captures the navigable geometry an agent actually traverses, without photometric entanglement and without collapsing the third dimension. Our key idea is to model the open volume between buildings, the negative space, encoded as a 3D isovist: a spherical visibility-depth map recording the distance to the nearest surface in every direction. We introduce an embodied world model that predicts the next isovist from a short history of past isovists and a movement action. The prediction is formulated as a depth residual so the decoder inherits sharp building edges, trained with self-rollout scheduled sampling to keep corrupted context on the geometry manifold, and equipped with a persistent latent bird's-eye-view spatial map for cross-path consistency. Our central finding is emergent and unexpected: a single city-blind model trained on Manhattan and Paris develops a cross-city spatial signature, with city identity linearly decodable from its temporal latents far above single-frame baselines, so the signature lives in the learned dynamics rather than in appearance. The representation is lightweight, interpretable, and reproducible, offering a geometric substrate for spatial reasoning in embodied AI, robotics, and urban analysis, released with an open dataset and pipeline.
Abstract（参考訳）: 都市を航行するエージェントは、周囲がどう変化するかを予測する世界モデルに頼っている。しかしナビゲーションにとって重要なのは、建物がどのように見えるかではなく、エージェントがどこに行くかだ。ほとんどの世界モデルは外見を予測し、エージェントが移動できる空間よりも、シーンがどのように見えるかを学ぶ。鳥の目視の占有格子のようなターゲットの幾何学的特徴は、三次元環境を平面に平らにし、実際の航法を形作る地上構造と多層構造を捨てる。欠けているのは、エージェントが実際に横断する航法可能な幾何学を撮影する予測的ターゲットである。私たちのキーとなるアイデアは、3Dアイソビストとしてエンコードされた建物間のオープンボリュームをモデル化することです。我々は,過去のアイソビストと運動行動の短い歴史から,次のアイソビストを予測する具体的世界モデルを導入する。予測は奥行き残差として定式化され、デコーダは鋭い建物縁を継承し、自転予定サンプリングでトレーニングされ、幾何学多様体上の破損したコンテキストを維持する。マンハッタンとパリで訓練された1つの都市ブラインドモデルは、都市間の空間的シグネチャを発達させ、都市のアイデンティティは1フレームのベースラインよりもはるかに高い時間的遅延から線形にデオードできるので、そのシグネチャは外観よりも学習されたダイナミクスに生きています。この表現は軽量で解釈可能で再現性があり、埋め込みAI、ロボティクス、都市分析における空間推論のための幾何学的基盤を提供し、オープンデータセットとパイプラインでリリースされている。

論文の概要: A 3D Isovist World Model -- Revealing a City's Unseen Geometry and Its Emergent Cross-City Signature

関連論文リスト