LLM-Guided Indoor Navigation with Multimodal Map Understanding
- URL: http://arxiv.org/abs/2503.11702v4
- Date: Thu, 19 Jun 2025 13:05:07 GMT
- Title: LLM-Guided Indoor Navigation with Multimodal Map Understanding
- Authors: Alberto Coffrini, Paolo Barsocchi, Francesco Furfari, Antonino Crivello, Alessio Ferrari,
- Abstract summary: We explore the potential of a Large Language Model (LLM), i.e., ChatGPT, to generate context-aware navigation instructions from indoor map images.<n>Our findings demonstrate the potential of LLMs for supporting personalized indoor navigation, with an average of 86.59% correct indications and a maximum of 97.14%.<n>These results have key implications for AI-driven navigation and assistive technologies.
- Score: 1.5325823985727567
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Indoor navigation presents unique challenges due to complex layouts and the unavailability of GNSS signals. Existing solutions often struggle with contextual adaptation, and typically require dedicated hardware. In this work, we explore the potential of a Large Language Model (LLM), i.e., ChatGPT, to generate natural, context-aware navigation instructions from indoor map images. We design and evaluate test cases across different real-world environments, analyzing the effectiveness of LLMs in interpreting spatial layouts, handling user constraints, and planning efficient routes. Our findings demonstrate the potential of LLMs for supporting personalized indoor navigation, with an average of 86.59% correct indications and a maximum of 97.14%. The proposed system achieves high accuracy and reasoning performance. These results have key implications for AI-driven navigation and assistive technologies.
Related papers
- Dynamic Path Navigation for Motion Agents with LLM Reasoning [69.5875073447454]
Large Language Models (LLMs) have demonstrated strong generalizable reasoning and planning capabilities.<n>We explore the zero-shot navigation and path generation capabilities of LLMs by constructing a dataset and proposing an evaluation protocol.<n>We demonstrate that, when tasks are well-structured in this manner, modern LLMs exhibit substantial planning proficiency in avoiding obstacles while autonomously refining navigation with the generated motion to reach the target.
arXiv Detail & Related papers (2025-03-10T13:39:09Z) - From Text to Space: Mapping Abstract Spatial Models in LLMs during a Grid-World Navigation Task [0.0]
We investigate the influence of different text-based spatial representations on large language models (LLMs) performance and internal activations in a grid-world navigation task.<n>Our experiments reveal that cartesian representations of space consistently yield higher success rates and path efficiency, with performance scaling markedly with model size.<n>This work advances our understanding of how LLMs process spatial information and provides valuable insights for developing more interpretable and robust agentic AI systems.
arXiv Detail & Related papers (2025-02-23T19:09:01Z) - TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation [52.422619828854984]
We introduce TopV-Nav, an MLLM-based method that directly reasons on the top-view map with sufficient spatial information.
To fully unlock the MLLM's spatial reasoning potential in top-view perspective, we propose the Adaptive Visual Prompt Generation (AVPG) method.
arXiv Detail & Related papers (2024-11-25T14:27:55Z) - Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment [56.87031484108484]
Large Language Models (LLMs) are increasingly recognized for their practical applications.
Retrieval-Augmented Generation (RAG) tackles this challenge and has shown a significant impact on LLMs.
By minimizing retrieval requests that yield neutral or harmful results, we can effectively reduce both time and computational costs.
arXiv Detail & Related papers (2024-11-09T15:12:28Z) - Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments [1.18749525824656]
Guide-LLM is a text-based agent designed to assist persons with visual impairments (PVI) in navigating large indoor environments.<n>Our approach features a novel text-based topological map that enables the LLM to plan global paths.<n>Simulated experiments demonstrate the system's efficacy in guiding PVI, underscoring its potential as a significant advancement in assistive technology.
arXiv Detail & Related papers (2024-10-28T01:58:21Z) - Long-distance Geomagnetic Navigation in GNSS-denied Environments with Deep Reinforcement Learning [62.186340267690824]
Existing studies on geomagnetic navigation rely on pre-stored map or extensive searches, leading to limited applicability or reduced navigation efficiency in unexplored areas.
This paper develops a deep reinforcement learning (DRL)-based mechanism, especially for long-distance geomagnetic navigation.
The designed mechanism trains an agent to learn and gain the magnetoreception capacity for geomagnetic navigation, rather than using any pre-stored map or extensive and expensive searching approaches.
arXiv Detail & Related papers (2024-10-21T09:57:42Z) - IN-Sight: Interactive Navigation through Sight [20.184155117341497]
IN-Sight is a novel approach to self-supervised path planning.
It calculates traversability scores and incorporates them into a semantic map.
To precisely navigate around obstacles, IN-Sight employs a local planner.
arXiv Detail & Related papers (2024-08-01T07:27:54Z) - LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning [91.95362946266577]
Path planning is a fundamental scientific problem in robotics and autonomous navigation.
Traditional algorithms like A* and its variants are capable of ensuring path validity but suffer from significant computational and memory inefficiencies as the state space grows.
We propose a new LLM based route planning method that synergistically combines the precise pathfinding capabilities of A* with the global reasoning capability of LLMs.
This hybrid approach aims to enhance pathfinding efficiency in terms of time and space complexity while maintaining the integrity of path validity, especially in large-scale scenarios.
arXiv Detail & Related papers (2024-06-20T01:24:30Z) - Can large language models explore in-context? [87.49311128190143]
We deploy Large Language Models as agents in simple multi-armed bandit environments.
We find that the models do not robustly engage in exploration without substantial interventions.
arXiv Detail & Related papers (2024-03-22T17:50:43Z) - TINA: Think, Interaction, and Action Framework for Zero-Shot Vision Language Navigation [11.591176410027224]
This paper presents a Vision-Language Navigation (VLN) agent based on Large Language Models (LLMs)
We propose the Thinking, Interacting, and Action framework to compensate for the shortcomings of LLMs in environmental perception.
Our approach also outperformed some supervised learning-based methods, highlighting its efficacy in zero-shot navigation.
arXiv Detail & Related papers (2024-03-13T05:22:39Z) - NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning
Disentangled Reasoning [101.56342075720588]
Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions.
Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability.
This paper introduces a novel strategy called Navigational Chain-of-Thought (NavCoT), where we fulfill parameter-efficient in-domain training to enable self-guided navigational decision.
arXiv Detail & Related papers (2024-03-12T07:27:02Z) - Angle Robustness Unmanned Aerial Vehicle Navigation in GNSS-Denied
Scenarios [66.05091704671503]
We present a novel angle navigation paradigm to deal with flight deviation in point-to-point navigation tasks.
We also propose a model that includes the Adaptive Feature Enhance Module, Cross-knowledge Attention-guided Module and Robust Task-oriented Head Module.
arXiv Detail & Related papers (2024-02-04T08:41:20Z) - VELMA: Verbalization Embodiment of LLM Agents for Vision and Language
Navigation in Street View [81.58612867186633]
Vision and Language Navigation(VLN) requires visual and natural language understanding as well as spatial and temporal reasoning capabilities.
We show that VELMA is able to successfully follow navigation instructions in Street View with only two in-context examples.
We further finetune the LLM agent on a few thousand examples and achieve 25%-30% relative improvement in task completion over the previous state-of-the-art for two datasets.
arXiv Detail & Related papers (2023-07-12T11:08:24Z) - NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large
Language Models [17.495162643127003]
We introduce the NavGPT to reveal the reasoning capability of GPT models in complex embodied scenes.
NavGPT takes the textual descriptions of visual observations, navigation history, and future explorable directions as inputs to reason the agent's current status.
We show that NavGPT is capable of generating high-quality navigational instructions from observations and actions along a path.
arXiv Detail & Related papers (2023-05-26T14:41:06Z) - ETPNav: Evolving Topological Planning for Vision-Language Navigation in
Continuous Environments [56.194988818341976]
Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments.
We propose ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments.
ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets.
arXiv Detail & Related papers (2023-04-06T13:07:17Z) - Using Deep Reinforcement Learning with Automatic Curriculum earning for
Mapless Navigation in Intralogistics [0.7633618497843278]
We propose a deep reinforcement learning approach for solving a mapless navigation problem in warehouse scenarios.
The automatic guided vehicle is equipped with LiDAR and frontal RGB sensors and learns to reach underneath the target dolly.
NavACL-Q greatly facilitates the whole learning process and a pre-trained feature extractor manifestly boosts the training speed.
arXiv Detail & Related papers (2022-02-23T13:50:01Z) - Diagnosing Vision-and-Language Navigation: What Really Matters [61.72935815656582]
Vision-and-language navigation (VLN) is a multimodal task where an agent follows natural language instructions and navigates in visual environments.
Recent studies witness a slow-down in the performance improvements in both indoor and outdoor VLN tasks.
In this work, we conduct a series of diagnostic experiments to unveil agents' focus during navigation.
arXiv Detail & Related papers (2021-03-30T17:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.