NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions
- URL: http://arxiv.org/abs/2510.08173v1
- Date: Thu, 09 Oct 2025 12:59:19 GMT
- Title: NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions
- Authors: Haolin Yang, Yuxing Long, Zhuoyuan Yu, Zihan Yang, Minghan Wang, Jiapeng Xu, Yihan Wang, Ziyan Yu, Wenzhe Cai, Lei Kang, Hao Dong,
- Abstract summary: We introduce the NavSpace benchmark, which contains task categories and 1,228 trajectory-instruction pairs.<n>We comprehensively evaluate 22 navigation agents, including state-of-the-art navigation models and multimodal large language models.<n>We propose SNav, a new spatially intelligent navigation model.
- Score: 31.144783513493433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instruction-following navigation is a key step toward embodied intelligence. Prior benchmarks mainly focus on semantic understanding but overlook systematically evaluating navigation agents' spatial perception and reasoning capabilities. In this work, we introduce the NavSpace benchmark, which contains six task categories and 1,228 trajectory-instruction pairs designed to probe the spatial intelligence of navigation agents. On this benchmark, we comprehensively evaluate 22 navigation agents, including state-of-the-art navigation models and multimodal large language models. The evaluation results lift the veil on spatial intelligence in embodied navigation. Furthermore, we propose SNav, a new spatially intelligent navigation model. SNav outperforms existing navigation agents on NavSpace and real robot tests, establishing a strong baseline for future work.
Related papers
- Human-like Navigation in a World Built for Humans [23.303995665820846]
We present ReasonNav, a modular navigation system which integrates human-like navigation skills.<n>We design compact input and output abstractions based on navigation landmarks.<n>We show that ReasonNav successfully employs higher-order reasoning to navigate efficiently in large, complex buildings.
arXiv Detail & Related papers (2025-09-25T14:04:17Z) - AI Guide Dog: Egocentric Path Prediction on Smartphone [2.050167020109177]
AIGD employs a vision-only multi-label classification approach to predict directional commands.<n>We introduce a novel technique for goal-based outdoor navigation by integrating GPS signals.<n>We present methods, datasets, evaluations, and deployment insights to encourage further innovations in assistive navigation systems.
arXiv Detail & Related papers (2025-01-14T09:21:17Z) - NAVCON: A Cognitively Inspired and Linguistically Grounded Corpus for Vision and Language Navigation [66.89717229608358]
NAVCON is a large-scale annotated Vision-Language Navigation (VLN) corpus built on top of two popular datasets (R2R and RxR)
arXiv Detail & Related papers (2024-12-17T15:48:25Z) - InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment [5.43847693345519]
In this work, we propose InstructNav, a generic instruction navigation system.
InstructNav makes the first endeavor to handle various instruction navigation tasks without any navigation training or pre-built maps.
With InstructNav, we complete the R2R-CE task in a zero-shot way for the first time and outperform many task-training methods.
arXiv Detail & Related papers (2024-06-07T12:26:34Z) - Angle Robustness Unmanned Aerial Vehicle Navigation in GNSS-Denied
Scenarios [66.05091704671503]
We present a novel angle navigation paradigm to deal with flight deviation in point-to-point navigation tasks.
We also propose a model that includes the Adaptive Feature Enhance Module, Cross-knowledge Attention-guided Module and Robust Task-oriented Head Module.
arXiv Detail & Related papers (2024-02-04T08:41:20Z) - SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments [14.179677726976056]
SayNav is a new approach that leverages human knowledge from Large Language Models (LLMs) for efficient generalization to complex navigation tasks.
SayNav achieves state-of-the-art results and even outperforms an oracle based baseline with strong ground-truth assumptions by more than 8% in terms of success rate.
arXiv Detail & Related papers (2023-09-08T02:24:37Z) - ETPNav: Evolving Topological Planning for Vision-Language Navigation in
Continuous Environments [56.194988818341976]
Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments.
We propose ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments.
ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets.
arXiv Detail & Related papers (2023-04-06T13:07:17Z) - Deep Learning-based Spacecraft Relative Navigation Methods: A Survey [3.964047152162558]
This survey aims to investigate the current deep learning-based autonomous spacecraft relative navigation methods.
It focuses on concrete orbital applications such as spacecraft rendezvous and landing on small bodies or the Moon.
arXiv Detail & Related papers (2021-08-19T18:54:19Z) - Diagnosing Vision-and-Language Navigation: What Really Matters [61.72935815656582]
Vision-and-language navigation (VLN) is a multimodal task where an agent follows natural language instructions and navigates in visual environments.
Recent studies witness a slow-down in the performance improvements in both indoor and outdoor VLN tasks.
In this work, we conduct a series of diagnostic experiments to unveil agents' focus during navigation.
arXiv Detail & Related papers (2021-03-30T17:59:07Z) - Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions.
By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment.
Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z) - Active Visual Information Gathering for Vision-Language Navigation [115.40768457718325]
Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments.
One of the key challenges in VLN is how to conduct a robust navigation by mitigating the uncertainty caused by ambiguous instructions and insufficient observation of the environment.
This work draws inspiration from human navigation behavior and endows an agent with an active information gathering ability for a more intelligent VLN policy.
arXiv Detail & Related papers (2020-07-15T23:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.