Fugu-MT 論文翻訳(概要): SpatialAnt: Autonomous Zero-Shot Robot Navigation via Active Scene Reconstruction and Visual Anticipation

論文の概要: SpatialAnt: Autonomous Zero-Shot Robot Navigation via Active Scene Reconstruction and Visual Anticipation

arxiv url: http://arxiv.org/abs/2603.26837v1
Date: Fri, 27 Mar 2026 08:01:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:44.657116
Title: SpatialAnt: Autonomous Zero-Shot Robot Navigation via Active Scene Reconstruction and Visual Anticipation
Title（参考訳）: SpaceAnt: アクティブシーン再構築と視覚予測による自律ゼロショットロボットナビゲーション
Authors: Jiwen Zhang, Xiangyu Shi, Siyuan Wang, Zerui Li, Zhongyu Wei, Qi Wu,
Abstract要約: SpaceAntは、不完全な自己再構成と堅牢な実行の間のギャップを埋めるために設計されたゼロショットナビゲーションフレームワークである。本研究では,SpatialAntがシミュレーションおよび実世界の環境において既存のゼロショット法より著しく優れていることを示す。
参考スコア（独自算出の注目度）: 45.461768743080604
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Vision-and-Language Navigation (VLN) has recently benefited from Multimodal Large Language Models (MLLMs), enabling zero-shot navigation. While recent exploration-based zero-shot methods have shown promising results by leveraging global scene priors, they rely on high-quality human-crafted scene reconstructions, which are impractical for real-world robot deployment. When encountering an unseen environment, a robot should build its own priors through pre-exploration. However, these self-built reconstructions are inevitably incomplete and noisy, which severely degrade methods that depend on high-quality scene reconstructions. To address these issues, we propose SpatialAnt, a zero-shot navigation framework designed to bridge the gap between imperfect self-reconstructions and robust execution. SpatialAnt introduces a physical grounding strategy to recover the absolute metric scale for monocular-based reconstructions. Furthermore, rather than treating the noisy self-reconstructed scenes as absolute spatial references, we propose a novel visual anticipation mechanism. This mechanism leverages the noisy point clouds to render future observations, enabling the agent to perform counterfactual reasoning and prune paths that contradict human instructions. Extensive experiments in both simulated and real-world environments demonstrate that SpatialAnt significantly outperforms existing zero-shot methods. We achieve a 66% Success Rate (SR) on R2R-CE and 50.8% SR on RxR-CE benchmarks. Physical deployment on a Hello Robot further confirms the efficiency and efficacy of our framework, achieving a 52% SR in challenging real-world settings.
Abstract（参考訳）: Vision-and-Language Navigation (VLN)は最近、ゼロショットナビゲーションを可能にするMultimodal Large Language Models (MLLM)の恩恵を受けている。近年の探究に基づくゼロショット手法は,グローバルなシーン先行の活用による有望な成果を示しているが,現実のロボットの展開には不十分な,高品質な人為的なシーン再構築に頼っている。目に見えない環境に遭遇するときは、ロボットは事前探索によって独自の事前情報を構築する必要がある。しかし、これらの自己再建は必然的に不完全で騒々しいものであり、高品質なシーン再構築に依存する方法が著しく劣化している。これらの問題に対処するために,不完全な自己再構成と堅牢な実行のギャップを埋めるために設計されたゼロショットナビゲーションフレームワークであるSpatialAntを提案する。 SpaceAntは、モノクラーリコンストラクションのための絶対測度スケールを回復するための物理接地戦略を導入している。さらに,ノイズの多い自己再構成シーンを絶対的な空間参照として扱うのではなく,新しい視覚予測機構を提案する。このメカニズムはノイズの多い点雲を利用して将来の観測をレンダリングし、エージェントは人間の指示に反する偽の推論や不規則な経路を実行できる。シミュレーションと実世界の両方の環境における大規模な実験は、SpatialAntが既存のゼロショット法を大幅に上回っていることを示している。我々はR2R-CEで66%の成功率(SR)、RxR-CEベンチマークで50.8%のSRを達成する。 Hello Robotへの物理的なデプロイは、我々のフレームワークの効率性と有効性をさらに確認し、現実世界の環境に挑戦する上で、52%のSRを達成する。

論文の概要: SpatialAnt: Autonomous Zero-Shot Robot Navigation via Active Scene Reconstruction and Visual Anticipation

関連論文リスト