Related papers: Floorplan2Guide: LLM-Guided Floorplan Parsing for BLV Indoor Navigation

Floorplan2Guide: LLM-Guided Floorplan Parsing for BLV Indoor Navigation

URL: http://arxiv.org/abs/2512.12177v1
Date: Sat, 13 Dec 2025 04:49:26 GMT
Title: Floorplan2Guide: LLM-Guided Floorplan Parsing for BLV Indoor Navigation
Authors: Aydin Ayanzadeh, Tim Oates,
Abstract summary: We propose a novel navigation approach that transforms floor plans into navigable knowledge graphs and generate human-readable navigation instructions.<n> Floorplan2Guide integrates a large language model (LLM) to extract spatial information from architectural layouts.<n>Results indicate that few-shot learning improves navigation accuracy in comparison to zero-shot learning on simulated and real-world evaluations.
Score: 4.3114959617830015
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Indoor navigation remains a critical challenge for people with visual impairments. The current solutions mainly rely on infrastructure-based systems, which limit their ability to navigate safely in dynamic environments. We propose a novel navigation approach that utilizes a foundation model to transform floor plans into navigable knowledge graphs and generate human-readable navigation instructions. Floorplan2Guide integrates a large language model (LLM) to extract spatial information from architectural layouts, reducing the manual preprocessing required by earlier floorplan parsing methods. Experimental results indicate that few-shot learning improves navigation accuracy in comparison to zero-shot learning on simulated and real-world evaluations. Claude 3.7 Sonnet achieves the highest accuracy among the evaluated models, with 92.31%, 76.92%, and 61.54% on the short, medium, and long routes, respectively, under 5-shot prompting of the MP-1 floor plan. The success rate of graph-based spatial structure is 15.4% higher than that of direct visual reasoning among all models, which confirms that graphical representation and in-context learning enhance navigation performance and make our solution more precise for indoor navigation of Blind and Low Vision (BLV) users.

Related papers

Boosting Zero-Shot VLN via Abstract Obstacle Map-Based Waypoint Prediction with TopoGraph-and-VisitInfo-Aware Prompting [18.325003967982827]
Vision-language navigation (VLN) has emerged as a key task for embodied agents with broad practical applications.<n>We propose a zero-shot framework that integrates a simplified yet effective waypoint predictor with a multimodal large language model (MLLM)<n>Experiments on R2R-CE and RxR-CE show that our method achieves state-of-the-art zero-shot performance, with success rates of 41% and 36%, respectively.
arXiv Detail & Related papers (2025-09-24T19:21:39Z)
Fine-Tuning Vision-Language Models for Visual Navigation Assistance [28.43430422119113]
We address vision-language-driven indoor navigation to assist visually impaired individuals in reaching a target location using images and natural language guidance.<n>Our approach integrates vision and language models to generate step-by-step navigational instructions, enhancing accessibility and independence.
arXiv Detail & Related papers (2025-09-09T08:08:35Z)
Vision-Based Localization and LLM-based Navigation for Indoor Environments [4.58063394223487]
This study presents an indoor localization and navigation approach that integrates vision-based localization with large language model (LLM)-based navigation.<n>The model achieved high confidence and an accuracy of 96% across all tested waypoints, even under constrained viewing conditions.<n>This research demonstrates the potential for scalable, infrastructure-free indoor navigation using off-the-shelf cameras and publicly available floor plans.
arXiv Detail & Related papers (2025-08-11T15:59:09Z)
PIG-Nav: Key Insights for Pretrained Image Goal Navigation Models [16.820485795257195]
PIG-Nav (Pretrained Image-Goal Navigation) is a new approach that further investigates pretraining strategies for vision-based navigation models.<n>We identify two critical design choices that consistently improve the performance of pretrained navigation models.<n>Our model achieves an average improvement of 22.6% in zero-shot settings and a 37.5% improvement in fine-tuning settings over existing visual navigation foundation models.
arXiv Detail & Related papers (2025-07-23T05:34:20Z)
DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation [55.888688171010365]
DORAEMON is a cognitive-inspired framework consisting of Ventral and Dorsal Streams that mimics human navigation capabilities.<n>We evaluate DORAEMON on the HM3D, MP3D and GOAT datasets, where it achieves state-of-the-art performance on both success rate (SR) and success weighted by path length (SPL) metrics.
arXiv Detail & Related papers (2025-05-28T04:46:13Z)
Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation [64.84996994779443]
We propose a novel Affordances-Oriented Planner for continuous vision-language navigation (VLN) task. Our AO-Planner integrates various foundation models to achieve affordances-oriented low-level motion planning and high-level decision-making. Experiments on the challenging R2R-CE and RxR-CE datasets show that AO-Planner achieves state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2024-07-08T12:52:46Z)
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning [97.88246428240872]
Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions.<n>Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability.<n>This paper introduces a novel strategy called Navigational Chain-of-Thought (NavCoT), where we fulfill parameter-efficient in-domain training to enable self-guided navigational decision.
arXiv Detail & Related papers (2024-03-12T07:27:02Z)
VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model [28.79971953667143]
VoroNav is a semantic exploration framework to extract exploratory paths and planning nodes from a semantic map constructed in real time. By harnessing topological and semantic information, VoroNav designs text-based descriptions of paths and images that are readily interpretable by a large language model.
arXiv Detail & Related papers (2024-01-05T08:05:07Z)
Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning. We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs. Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z)
Learning Navigational Visual Representations with Semantic Map Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps. Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z)
Waypoint Models for Instruction-guided Navigation in Continuous Environments [68.2912740006109]
We develop a class of language-conditioned waypoint prediction networks to examine this question. We measure task performance and estimated execution time on a profiled LoCoBot robot. Our models outperform prior work in VLN-CE and set a new state-of-the-art on the public leaderboard.
arXiv Detail & Related papers (2021-10-05T17:55:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.