Related papers: Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach

Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach

URL: http://arxiv.org/abs/2502.00114v1
Date: Fri, 31 Jan 2025 19:03:33 GMT
Title: Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach
Authors: Aaron Hao Tan, Angus Fung, Haitong Wang, Goldie Nejat,
Abstract summary: This paper introduces a novel Hand-drawn Map Navigation (HAM-Nav) architecture.<n>HAM-Nav integrates a unique Selective Visual Association Prompting approach for topological map-based position estimation.<n>Experiments were conducted in simulated environments, using both wheeled and legged robots.
Score: 5.009635912655658
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hand-drawn maps can be used to convey navigation instructions between humans and robots in a natural and efficient manner. However, these maps can often contain inaccuracies such as scale distortions and missing landmarks which present challenges for mobile robot navigation. This paper introduces a novel Hand-drawn Map Navigation (HAM-Nav) architecture that leverages pre-trained vision language models (VLMs) for robot navigation across diverse environments, hand-drawing styles, and robot embodiments, even in the presence of map inaccuracies. HAM-Nav integrates a unique Selective Visual Association Prompting approach for topological map-based position estimation and navigation planning as well as a Predictive Navigation Plan Parser to infer missing landmarks. Extensive experiments were conducted in photorealistic simulated environments, using both wheeled and legged robots, demonstrating the effectiveness of HAM-Nav in terms of navigation success rates and Success weighted by Path Length. Furthermore, a user study in real-world environments highlighted the practical utility of hand-drawn maps for robot navigation as well as successful navigation outcomes.

Related papers

Vi-LAD: Vision-Language Attention Distillation for Socially-Aware Robot Navigation in Dynamic Environments [41.75629159747654]
We introduce Vision-Language Attention Distillation (Vi-LAD), a novel approach for distilling socially compliant navigation knowledge. Vi-LAD fine-tunes a transformer-based model using intermediate attention maps extracted from a pre-trained vision-action model. We validate our approach through real-world experiments on a Husky wheeled robot, demonstrating significant improvements over state-of-the-art navigation methods.
arXiv Detail & Related papers (2025-03-12T20:38:23Z)
VL-Nav: Real-time Vision-Language Navigation with Spatial Reasoning [11.140494493881075]
We present a novel vision-language navigation (VL-Nav) system on low-power robots.<n>Unlike prior methods that rely on a single image-level feature similarity to guide a robot, we introduce the vision-vision-language (HVL)<n>It combines pixel-wise vision-language features and exploration to enable efficient navigation to human-instructed instances robustly.
arXiv Detail & Related papers (2025-02-02T21:44:15Z)
CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction [19.997935470257794]
We present CANVAS, a framework that combines visual and linguistic instructions for commonsense-aware navigation. Its success is driven by imitation learning, enabling the robot to learn from human navigation behavior. Our experiments show that CANVAS outperforms the strong rule-based system ROS NavStack across all environments.
arXiv Detail & Related papers (2024-10-02T06:34:45Z)
Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation. Our method significantly outperforms the state of the art on the challenging MP3D dataset. We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z)
Learning Navigational Visual Representations with Semantic Map Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps. Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z)
ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments [56.194988818341976]
Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments. We propose ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments. ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets.
arXiv Detail & Related papers (2023-04-06T13:07:17Z)
Gesture2Path: Imitation Learning for Gesture-aware Navigation [54.570943577423094]
We present Gesture2Path, a novel social navigation approach that combines image-based imitation learning with model-predictive control. We deploy our method on real robots and showcase the effectiveness of our approach for the four gestures-navigation scenarios.
arXiv Detail & Related papers (2022-09-19T23:05:36Z)
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action [76.71101507291473]
We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories. We show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data.
arXiv Detail & Related papers (2022-07-10T10:41:50Z)
ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints [94.60414567852536]
Long-range navigation requires both planning and reasoning about local traversability. We propose a learning-based approach that integrates learning and planning. ViKiNG can leverage its image-based learned controller and goal-directed to navigate to goals up to 3 kilometers away.
arXiv Detail & Related papers (2022-02-23T02:14:23Z)
Topological Planning with Transformers for Vision-and-Language Navigation [31.64229792521241]
We propose a modular approach to vision-and-language navigation (VLN) using topological maps. Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan in the map. Experiments show that our method outperforms previous end-to-end approaches, generates interpretable navigation plans, and exhibits intelligent behaviors such as backtracking.
arXiv Detail & Related papers (2020-12-09T20:02:03Z)
APPLD: Adaptive Planner Parameter Learning from Demonstration [48.63930323392909]
We introduce APPLD, Adaptive Planner Learning from Demonstration, that allows existing navigation systems to be successfully applied to new complex environments. APPLD is verified on two robots running different navigation systems in different environments. Experimental results show that APPLD can outperform navigation systems with the default and expert-tuned parameters, and even the human demonstrator themselves.
arXiv Detail & Related papers (2020-03-31T21:15:16Z)
Robot Navigation in Unseen Spaces using an Abstract Map [11.473894284561878]
We present a robot navigation system that uses the same symbolic spatial information employed by humans to purposefully navigate in unseen built environments. We show how a dynamic system can be used to create malleable spatial models for the abstract map, and provide an open source implementation to encourage future work in the area of symbolic navigation. The paper concludes with a qualitative analysis of human navigation strategies, providing further insights into how the symbolic navigation capabilities of robots in unseen built environments can be improved in the future.
arXiv Detail & Related papers (2020-01-31T07:40:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.