Related papers: Vision-Based Localization and LLM-based Navigation for Indoor Environments

Vision-Based Localization and LLM-based Navigation for Indoor Environments

URL: http://arxiv.org/abs/2508.08120v1
Date: Mon, 11 Aug 2025 15:59:09 GMT
Title: Vision-Based Localization and LLM-based Navigation for Indoor Environments
Authors: Keyan Rahimi, Md. Wasiul Haque, Sagar Dasgupta, Mizanur Rahman,
Abstract summary: This study presents an indoor localization and navigation approach that integrates vision-based localization with large language model (LLM)-based navigation.<n>The model achieved high confidence and an accuracy of 96% across all tested waypoints, even under constrained viewing conditions.<n>This research demonstrates the potential for scalable, infrastructure-free indoor navigation using off-the-shelf cameras and publicly available floor plans.
Score: 4.58063394223487
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Indoor navigation remains a complex challenge due to the absence of reliable GPS signals and the architectural intricacies of large enclosed environments. This study presents an indoor localization and navigation approach that integrates vision-based localization with large language model (LLM)-based navigation. The localization system utilizes a ResNet-50 convolutional neural network fine-tuned through a two-stage process to identify the user's position using smartphone camera input. To complement localization, the navigation module employs an LLM, guided by a carefully crafted system prompt, to interpret preprocessed floor plan images and generate step-by-step directions. Experimental evaluation was conducted in a realistic office corridor with repetitive features and limited visibility to test localization robustness. The model achieved high confidence and an accuracy of 96% across all tested waypoints, even under constrained viewing conditions and short-duration queries. Navigation tests using ChatGPT on real building floor maps yielded an average instruction accuracy of 75%, with observed limitations in zero-shot reasoning and inference time. This research demonstrates the potential for scalable, infrastructure-free indoor navigation using off-the-shelf cameras and publicly available floor plans, particularly in resource-constrained settings like hospitals, airports, and educational institutions.

Related papers

Floorplan2Guide: LLM-Guided Floorplan Parsing for BLV Indoor Navigation [4.3114959617830015]
We propose a novel navigation approach that transforms floor plans into navigable knowledge graphs and generate human-readable navigation instructions.<n> Floorplan2Guide integrates a large language model (LLM) to extract spatial information from architectural layouts.<n>Results indicate that few-shot learning improves navigation accuracy in comparison to zero-shot learning on simulated and real-world evaluations.
arXiv Detail & Related papers (2025-12-13T04:49:26Z)
Self-Supervised Learning to Fly using Efficient Semantic Segmentation and Metric Depth Estimation for Low-Cost Autonomous UAVs [5.602128292727329]
This paper presents a vision-only autonomous flight system for small UAVs operating in controlled indoor environments.<n>The system combines semantic segmentation with monocular depth estimation to enable obstacle avoidance, scene exploration, and autonomous safe landing operations.<n>A key innovation is an adaptive scale factor algorithm that converts non-metric monocular depth predictions into accurate metric distance measurements.
arXiv Detail & Related papers (2025-10-18T19:35:17Z)
ActLoc: Learning to Localize on the Move via Active Viewpoint Selection [52.909507162638526]
ActLoc is an active viewpoint-aware planning framework for enhancing localization accuracy for general robot navigation tasks.<n>At its core, ActLoc employs a largescale trained attention-based model for viewpoint selection.<n>ActLoc achieves stateof-the-art results on single-viewpoint selection and generalizes effectively to fulltrajectory planning.
arXiv Detail & Related papers (2025-08-28T16:36:02Z)
NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments [56.35569661650558]
We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation.<n>Rather than constructing a global map, NOVA formulates perception, estimation, and control entirely in the target's reference frame.<n>We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss.
arXiv Detail & Related papers (2025-06-23T14:28:30Z)
LLM-Guided Indoor Navigation with Multimodal Map Understanding [1.5325823985727567]
We explore the potential of a Large Language Model (LLM), i.e., ChatGPT, to generate context-aware navigation instructions from indoor map images.<n>Our findings demonstrate the potential of LLMs for supporting personalized indoor navigation, with an average of 86.59% correct indications and a maximum of 97.14%.<n>These results have key implications for AI-driven navigation and assistive technologies.
arXiv Detail & Related papers (2025-03-12T09:32:43Z)
Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation [64.84996994779443]
We propose a novel Affordances-Oriented Planner for continuous vision-language navigation (VLN) task. Our AO-Planner integrates various foundation models to achieve affordances-oriented low-level motion planning and high-level decision-making. Experiments on the challenging R2R-CE and RxR-CE datasets show that AO-Planner achieves state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2024-07-08T12:52:46Z)
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning [97.88246428240872]
Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions.<n>Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability.<n>This paper introduces a novel strategy called Navigational Chain-of-Thought (NavCoT), where we fulfill parameter-efficient in-domain training to enable self-guided navigational decision.
arXiv Detail & Related papers (2024-03-12T07:27:02Z)
Unsupervised Visual Odometry and Action Integration for PointGoal Navigation in Indoor Environment [14.363948775085534]
PointGoal navigation in indoor environment is a fundamental task for personal robots to navigate to a specified point. To improve the PointGoal navigation accuracy without GPS signal, we use visual odometry (VO) and propose a novel action integration module (AIM) trained in unsupervised manner. Experiments show that the proposed system achieves satisfactory results and outperforms the partially supervised learning algorithms on the popular Gibson dataset.
arXiv Detail & Related papers (2022-10-02T03:12:03Z)
UNav: An Infrastructure-Independent Vision-Based Navigation System for People with Blindness and Low vision [4.128685217530067]
We propose a vision-based localization pipeline for navigation support for end-users with blindness and low vision. Given a query image taken by an end-user on a mobile application, the pipeline leverages a visual place recognition (VPR) algorithm to find similar images in a reference image database. A customized user interface projects a 3D reconstructed sparse map, built from a sequence of images, to the corresponding a priori 2D floor plan.
arXiv Detail & Related papers (2022-09-22T22:21:37Z)
Real-time Outdoor Localization Using Radio Maps: A Deep Learning Approach [59.17191114000146]
LocUNet: A convolutional, end-to-end trained neural network (NN) for the localization task. We show that LocUNet can localize users with state-of-the-art accuracy and enjoys high robustness to inaccuracies in the estimations of radio maps.
arXiv Detail & Related papers (2021-06-23T17:27:04Z)
Real-time Localization Using Radio Maps [59.17191114000146]
We present a simple yet effective method for localization based on pathloss. In our approach, the user to be localized reports the received signal strength from a set of base stations with known locations.
arXiv Detail & Related papers (2020-06-09T16:51:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.