Floorplan2Guide: LLM-Guided Floorplan Parsing for BLV Indoor Navigation
- URL: http://arxiv.org/abs/2512.12177v1
- Date: Sat, 13 Dec 2025 04:49:26 GMT
- Title: Floorplan2Guide: LLM-Guided Floorplan Parsing for BLV Indoor Navigation
- Authors: Aydin Ayanzadeh, Tim Oates,
- Abstract summary: We propose a novel navigation approach that transforms floor plans into navigable knowledge graphs and generate human-readable navigation instructions.<n> Floorplan2Guide integrates a large language model (LLM) to extract spatial information from architectural layouts.<n>Results indicate that few-shot learning improves navigation accuracy in comparison to zero-shot learning on simulated and real-world evaluations.
- Score: 4.3114959617830015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Indoor navigation remains a critical challenge for people with visual impairments. The current solutions mainly rely on infrastructure-based systems, which limit their ability to navigate safely in dynamic environments. We propose a novel navigation approach that utilizes a foundation model to transform floor plans into navigable knowledge graphs and generate human-readable navigation instructions. Floorplan2Guide integrates a large language model (LLM) to extract spatial information from architectural layouts, reducing the manual preprocessing required by earlier floorplan parsing methods. Experimental results indicate that few-shot learning improves navigation accuracy in comparison to zero-shot learning on simulated and real-world evaluations. Claude 3.7 Sonnet achieves the highest accuracy among the evaluated models, with 92.31%, 76.92%, and 61.54% on the short, medium, and long routes, respectively, under 5-shot prompting of the MP-1 floor plan. The success rate of graph-based spatial structure is 15.4% higher than that of direct visual reasoning among all models, which confirms that graphical representation and in-context learning enhance navigation performance and make our solution more precise for indoor navigation of Blind and Low Vision (BLV) users.
Related papers
- Boosting Zero-Shot VLN via Abstract Obstacle Map-Based Waypoint Prediction with TopoGraph-and-VisitInfo-Aware Prompting [18.325003967982827]
Vision-language navigation (VLN) has emerged as a key task for embodied agents with broad practical applications.<n>We propose a zero-shot framework that integrates a simplified yet effective waypoint predictor with a multimodal large language model (MLLM)<n>Experiments on R2R-CE and RxR-CE show that our method achieves state-of-the-art zero-shot performance, with success rates of 41% and 36%, respectively.
arXiv Detail & Related papers (2025-09-24T19:21:39Z) - Fine-Tuning Vision-Language Models for Visual Navigation Assistance [28.43430422119113]
We address vision-language-driven indoor navigation to assist visually impaired individuals in reaching a target location using images and natural language guidance.<n>Our approach integrates vision and language models to generate step-by-step navigational instructions, enhancing accessibility and independence.
arXiv Detail & Related papers (2025-09-09T08:08:35Z) - Vision-Based Localization and LLM-based Navigation for Indoor Environments [4.58063394223487]
This study presents an indoor localization and navigation approach that integrates vision-based localization with large language model (LLM)-based navigation.<n>The model achieved high confidence and an accuracy of 96% across all tested waypoints, even under constrained viewing conditions.<n>This research demonstrates the potential for scalable, infrastructure-free indoor navigation using off-the-shelf cameras and publicly available floor plans.
arXiv Detail & Related papers (2025-08-11T15:59:09Z) - PIG-Nav: Key Insights for Pretrained Image Goal Navigation Models [16.820485795257195]
PIG-Nav (Pretrained Image-Goal Navigation) is a new approach that further investigates pretraining strategies for vision-based navigation models.<n>We identify two critical design choices that consistently improve the performance of pretrained navigation models.<n>Our model achieves an average improvement of 22.6% in zero-shot settings and a 37.5% improvement in fine-tuning settings over existing visual navigation foundation models.
arXiv Detail & Related papers (2025-07-23T05:34:20Z) - DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation [55.888688171010365]
DORAEMON is a cognitive-inspired framework consisting of Ventral and Dorsal Streams that mimics human navigation capabilities.<n>We evaluate DORAEMON on the HM3D, MP3D and GOAT datasets, where it achieves state-of-the-art performance on both success rate (SR) and success weighted by path length (SPL) metrics.
arXiv Detail & Related papers (2025-05-28T04:46:13Z) - Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation [64.84996994779443]
We propose a novel Affordances-Oriented Planner for continuous vision-language navigation (VLN) task.
Our AO-Planner integrates various foundation models to achieve affordances-oriented low-level motion planning and high-level decision-making.
Experiments on the challenging R2R-CE and RxR-CE datasets show that AO-Planner achieves state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2024-07-08T12:52:46Z) - NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning [97.88246428240872]
Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions.<n>Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability.<n>This paper introduces a novel strategy called Navigational Chain-of-Thought (NavCoT), where we fulfill parameter-efficient in-domain training to enable self-guided navigational decision.
arXiv Detail & Related papers (2024-03-12T07:27:02Z) - VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language
Model [28.79971953667143]
VoroNav is a semantic exploration framework to extract exploratory paths and planning nodes from a semantic map constructed in real time.
By harnessing topological and semantic information, VoroNav designs text-based descriptions of paths and images that are readily interpretable by a large language model.
arXiv Detail & Related papers (2024-01-05T08:05:07Z) - Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning.
We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs.
Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z) - Learning Navigational Visual Representations with Semantic Map
Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps.
Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z) - Waypoint Models for Instruction-guided Navigation in Continuous
Environments [68.2912740006109]
We develop a class of language-conditioned waypoint prediction networks to examine this question.
We measure task performance and estimated execution time on a profiled LoCoBot robot.
Our models outperform prior work in VLN-CE and set a new state-of-the-art on the public leaderboard.
arXiv Detail & Related papers (2021-10-05T17:55:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.