Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach
- URL: http://arxiv.org/abs/2502.00114v1
- Date: Fri, 31 Jan 2025 19:03:33 GMT
- Title: Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach
- Authors: Aaron Hao Tan, Angus Fung, Haitong Wang, Goldie Nejat,
- Abstract summary: This paper introduces a novel Hand-drawn Map Navigation (HAM-Nav) architecture.
HAM-Nav integrates a unique Selective Visual Association Prompting approach for topological map-based position estimation.
Experiments were conducted in simulated environments, using both wheeled and legged robots.
- Score: 5.009635912655658
- License:
- Abstract: Hand-drawn maps can be used to convey navigation instructions between humans and robots in a natural and efficient manner. However, these maps can often contain inaccuracies such as scale distortions and missing landmarks which present challenges for mobile robot navigation. This paper introduces a novel Hand-drawn Map Navigation (HAM-Nav) architecture that leverages pre-trained vision language models (VLMs) for robot navigation across diverse environments, hand-drawing styles, and robot embodiments, even in the presence of map inaccuracies. HAM-Nav integrates a unique Selective Visual Association Prompting approach for topological map-based position estimation and navigation planning as well as a Predictive Navigation Plan Parser to infer missing landmarks. Extensive experiments were conducted in photorealistic simulated environments, using both wheeled and legged robots, demonstrating the effectiveness of HAM-Nav in terms of navigation success rates and Success weighted by Path Length. Furthermore, a user study in real-world environments highlighted the practical utility of hand-drawn maps for robot navigation as well as successful navigation outcomes.
Related papers
- VL-Nav: Real-time Vision-Language Navigation with Spatial Reasoning [11.140494493881075]
We present a novel vision-language navigation (VL-Nav) system that integrates efficient spatial reasoning on low-power robots.
Unlike prior methods that rely on a single image-level feature similarity to guide a robot, our method integrates pixel-wise vision-language features with curiosity-driven exploration.
VL-Nav achieves an overall success rate of 86.3%, outperforming previous methods by 44.15%.
arXiv Detail & Related papers (2025-02-02T21:44:15Z) - CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction [19.997935470257794]
We present CANVAS, a framework that combines visual and linguistic instructions for commonsense-aware navigation.
Its success is driven by imitation learning, enabling the robot to learn from human navigation behavior.
Our experiments show that CANVAS outperforms the strong rule-based system ROS NavStack across all environments.
arXiv Detail & Related papers (2024-10-02T06:34:45Z) - Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation.
Our method significantly outperforms the state of the art on the challenging MP3D dataset.
We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z) - Learning Navigational Visual Representations with Semantic Map
Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps.
Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z) - ETPNav: Evolving Topological Planning for Vision-Language Navigation in
Continuous Environments [56.194988818341976]
Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments.
We propose ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments.
ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets.
arXiv Detail & Related papers (2023-04-06T13:07:17Z) - LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language,
Vision, and Action [76.71101507291473]
We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories.
We show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data.
arXiv Detail & Related papers (2022-07-10T10:41:50Z) - ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints [94.60414567852536]
Long-range navigation requires both planning and reasoning about local traversability.
We propose a learning-based approach that integrates learning and planning.
ViKiNG can leverage its image-based learned controller and goal-directed to navigate to goals up to 3 kilometers away.
arXiv Detail & Related papers (2022-02-23T02:14:23Z) - APPLD: Adaptive Planner Parameter Learning from Demonstration [48.63930323392909]
We introduce APPLD, Adaptive Planner Learning from Demonstration, that allows existing navigation systems to be successfully applied to new complex environments.
APPLD is verified on two robots running different navigation systems in different environments.
Experimental results show that APPLD can outperform navigation systems with the default and expert-tuned parameters, and even the human demonstrator themselves.
arXiv Detail & Related papers (2020-03-31T21:15:16Z) - Robot Navigation in Unseen Spaces using an Abstract Map [11.473894284561878]
We present a robot navigation system that uses the same symbolic spatial information employed by humans to purposefully navigate in unseen built environments.
We show how a dynamic system can be used to create malleable spatial models for the abstract map, and provide an open source implementation to encourage future work in the area of symbolic navigation.
The paper concludes with a qualitative analysis of human navigation strategies, providing further insights into how the symbolic navigation capabilities of robots in unseen built environments can be improved in the future.
arXiv Detail & Related papers (2020-01-31T07:40:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.