SkeNa: Learning to Navigate Unseen Environments Based on Abstract Hand-Drawn Maps
- URL: http://arxiv.org/abs/2508.03053v1
- Date: Tue, 05 Aug 2025 03:56:32 GMT
- Title: SkeNa: Learning to Navigate Unseen Environments Based on Abstract Hand-Drawn Maps
- Authors: Haojun Xu, Jiaqi Xiang, Wu Wei, Jinyu Chen, Linqing Zhong, Linjiang Huang, Hongyu Yang, Si Liu,
- Abstract summary: We introduce Sketch map-based visual Navigation (SkeNa)<n>SkeNa is an embodied navigation task in which an agent must reach a goal in an unseen environment using only a hand-drawn sketch map as guidance.<n>We present a large-scale dataset named SoR, comprising 54k trajectory and sketch map pairs across 71 indoor scenes.
- Score: 20.963573846962987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A typical human strategy for giving navigation guidance is to sketch route maps based on the environmental layout. Inspired by this, we introduce Sketch map-based visual Navigation (SkeNa), an embodied navigation task in which an agent must reach a goal in an unseen environment using only a hand-drawn sketch map as guidance. To support research for SkeNa, we present a large-scale dataset named SoR, comprising 54k trajectory and sketch map pairs across 71 indoor scenes. In SoR, we introduce two navigation validation sets with varying levels of abstraction in hand-drawn sketches, categorized based on their preservation of spatial scales in the environment, to facilitate future research. To construct SoR, we develop an automated sketch-generation pipeline that efficiently converts floor plans into hand-drawn representations. To solve SkeNa, we propose SkeNavigator, a navigation framework that aligns visual observations with hand-drawn maps to estimate navigation targets. It employs a Ray-based Map Descriptor (RMD) to enhance sketch map valid feature representation using equidistant sampling points and boundary distances. To improve alignment with visual observations, a Dual-Map Aligned Goal Predictor (DAGP) leverages the correspondence between sketch map features and on-site constructed exploration map features to predict goal position and guide navigation. SkeNavigator outperforms prior floor plan navigation methods by a large margin, improving SPL on the high-abstract validation set by 105% relatively. Our code and dataset will be released.
Related papers
- TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation [3.2688425993442696]
We propose a modular approach for the Vision-Language Navigation (VLN) task by decomposing the problem into four sub-modules.<n>Given navigation instruction in natural language, we first prompt LLM to extract the landmarks and the order in which they are visited.<n>We generate path hypotheses from the starting location to the last landmark using the shortest path algorithm on the topological map of the environment.
arXiv Detail & Related papers (2025-02-11T07:09:37Z) - Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach [5.009635912655658]
Hand-drawn maps can often contain inaccuracies such as scale distortions and missing landmarks.<n>This paper introduces a novel Hand-drawn Map Navigation (HAM-Nav) architecture that leverages pre-trained vision language models.<n>Ham-Nav integrates a unique Selective Visual Association Prompting approach for topological map-based position estimation and navigation planning.
arXiv Detail & Related papers (2025-01-31T19:03:33Z) - NavTopo: Leveraging Topological Maps For Autonomous Navigation Of a Mobile Robot [1.0550841723235613]
We propose a full navigation pipeline based on topological map and two-level path planning.
The pipeline localizes in the graph by matching neural network descriptors and 2D projections of the input point clouds.
We test our approach in a large indoor photo-relaistic simulated environment and compare it to a metric map-based approach based on popular metric mapping method RTAB-MAP.
arXiv Detail & Related papers (2024-10-15T10:54:49Z) - PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation [30.710806048991923]
Vision and language navigation is a task that requires an agent to navigate according to a natural language instruction.
Recent methods predict sub-goals on constructed topology map at each step to enable long-term action planning.
We propose an alternative method that facilitates navigation planning by considering the alignment between instructions and directed fidelity trajectories.
arXiv Detail & Related papers (2024-07-16T08:22:18Z) - GaussNav: Gaussian Splatting for Visual Navigation [92.13664084464514]
Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment.<n>We propose a new framework for IIN, Gaussian Splatting for Visual Navigation (GaussNav), which constructs a novel map representation based on 3D Gaussian Splatting (3DGS)<n>Our GaussNav framework demonstrates a significant performance improvement, with Success weighted by Path Length (SPL) increasing from 0.347 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset.
arXiv Detail & Related papers (2024-03-18T09:56:48Z) - Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation.
Our method significantly outperforms the state of the art on the challenging MP3D dataset.
We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z) - Learning Navigational Visual Representations with Semantic Map
Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps.
Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z) - Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language
Navigation [87.52136927091712]
We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions.
To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects.
We propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively.
arXiv Detail & Related papers (2022-10-14T04:23:27Z) - ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints [94.60414567852536]
Long-range navigation requires both planning and reasoning about local traversability.
We propose a learning-based approach that integrates learning and planning.
ViKiNG can leverage its image-based learned controller and goal-directed to navigate to goals up to 3 kilometers away.
arXiv Detail & Related papers (2022-02-23T02:14:23Z) - Navigating to Objects in Unseen Environments by Distance Prediction [16.023495311387478]
We propose an object goal navigation framework, which could directly perform path planning based on an estimated distance map.
Specifically, our model takes a birds-eye-view semantic map as input, and estimates the distance from the map cells to the target object.
With the estimated distance map, the agent could explore the environment and navigate to the target objects based on either human-designed or learned navigation policy.
arXiv Detail & Related papers (2022-02-08T09:22:50Z) - Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation [143.6144560164782]
We introduce a learning-based approach for room navigation using semantic maps.
We train a model to generate amodal semantic top-down maps indicating beliefs of location, size, and shape of rooms.
Next, we use these maps to predict a point that lies in the target room and train a policy to navigate to the point.
arXiv Detail & Related papers (2020-07-20T02:19:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.