Turn-by-Turn Indoor Navigation for the Visually Impaired
- URL: http://arxiv.org/abs/2410.19954v1
- Date: Fri, 25 Oct 2024 20:16:38 GMT
- Title: Turn-by-Turn Indoor Navigation for the Visually Impaired
- Authors: Santosh Srinivasaiah, Sai Kumar Nekkanti, Rohith Reddy Nedhunuri,
- Abstract summary: Navigating indoor environments presents significant challenges for visually impaired individuals.
This paper introduces a novel system that provides turn-by-turn navigation inside buildings using only a smartphone equipped with a camera.
Preliminary evaluations demonstrate the system's effectiveness in accurately guiding users through complex indoor spaces.
- Score: 0.0
- License:
- Abstract: Navigating indoor environments presents significant challenges for visually impaired individuals due to complex layouts and the absence of GPS signals. This paper introduces a novel system that provides turn-by-turn navigation inside buildings using only a smartphone equipped with a camera, leveraging multimodal models, deep learning algorithms, and large language models (LLMs). The smartphone's camera captures real-time images of the surroundings, which are then sent to a nearby Raspberry Pi capable of running on-device LLM models, multimodal models, and deep learning algorithms to detect and recognize architectural features, signage, and obstacles. The interpreted visual data is then translated into natural language instructions by an LLM running on the Raspberry Pi, which is sent back to the user, offering intuitive and context-aware guidance via audio prompts. This solution requires minimal workload on the user's device, preventing it from being overloaded and offering compatibility with all types of devices, including those incapable of running AI models. This approach enables the client to not only run advanced models but also ensure that the training data and other information do not leave the building. Preliminary evaluations demonstrate the system's effectiveness in accurately guiding users through complex indoor spaces, highlighting its potential for widespread application
Related papers
- PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs [55.8550939439138]
Vision-Language Models (VLMs) have shown immense potential by integrating large language models with vision systems.
These models face challenges in the fundamental computer vision task of object localisation, due to their training on multimodal data containing mostly captions.
We introduce an input-agnostic Positional Insert (PIN), a learnable spatial prompt, containing a minimal set of parameters that are slid inside the frozen VLM.
Our PIN module is trained with a simple next-token prediction task on synthetic data without requiring the introduction of new output heads.
arXiv Detail & Related papers (2024-02-13T18:39:18Z) - Floor extraction and door detection for visually impaired guidance [78.94595951597344]
Finding obstacle-free paths in unknown environments is a big navigation issue for visually impaired people and autonomous robots.
New devices based on computer vision systems can help impaired people to overcome the difficulties of navigating in unknown environments in safe conditions.
In this work it is proposed a combination of sensors and algorithms that can lead to the building of a navigation system for visually impaired people.
arXiv Detail & Related papers (2024-01-30T14:38:43Z) - Follow Anything: Open-set detection, tracking, and following in
real-time [89.83421771766682]
We present a robotic system to detect, track, and follow any object in real-time.
Our approach, dubbed follow anything'' (FAn), is an open-vocabulary and multimodal model.
FAn can be deployed on a laptop with a lightweight (6-8 GB) graphics card, achieving a throughput of 6-20 frames per second.
arXiv Detail & Related papers (2023-08-10T17:57:06Z) - On-device Training: A First Overview on Existing Systems [6.551096686706628]
Efforts have been made to deploy some models on resource-constrained devices as well.
This work targets to summarize and analyze state-of-the-art systems research that allows such on-device model training capabilities.
arXiv Detail & Related papers (2022-12-01T19:22:29Z) - Efficient Single-Image Depth Estimation on Mobile Devices, Mobile AI &
AIM 2022 Challenge: Report [108.88637766066759]
Deep learning-based single image depth estimation solutions can show a real-time performance on IoT platforms and smartphones.
Models developed in the challenge are also compatible with any Android or Linux-based mobile devices.
arXiv Detail & Related papers (2022-11-07T22:20:07Z) - LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language,
Vision, and Action [76.71101507291473]
We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories.
We show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data.
arXiv Detail & Related papers (2022-07-10T10:41:50Z) - ViT Cane: Visual Assistant for the Visually Impaired [0.0]
This paper proposes ViT Cane, which leverages a vision transformer model in order to detect obstacles in real-time.
Our entire system consists of a Pi Camera Module v2, Raspberry Pi 4B with 8GB Ram and 4 motors.
Based on tactile input using the 4 motors, the obstacle detection model is highly efficient in helping visually impaired navigate unknown terrain.
arXiv Detail & Related papers (2021-09-26T02:30:30Z) - Fast and Accurate Single-Image Depth Estimation on Mobile Devices,
Mobile AI 2021 Challenge: Report [105.32612705754605]
We introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based depth estimation solution.
The proposed solutions can generate VGA resolution depth maps at up to 10 FPS on the Raspberry Pi 4 while achieving high fidelity results.
arXiv Detail & Related papers (2021-05-17T13:49:57Z) - Movement Tracking by Optical Flow Assisted Inertial Navigation [18.67291804847956]
We show how a learning-based optical flow model can be combined with conventional inertial navigation.
We show how ideas from probabilistic deep learning can aid the robustness of the measurement updates.
The practical applicability is demonstrated on real-world data acquired by an iPad.
arXiv Detail & Related papers (2020-06-24T16:36:13Z) - Visually Impaired Aid using Convolutional Neural Networks, Transfer
Learning, and Particle Competition and Cooperation [0.0]
We propose the use of convolutional neural networks (CNN), transfer learning, and semi-supervised learning (SSL) to build a framework aimed at the visually impaired aid.
It has low computational costs and, therefore, may be implemented on current smartphones, without relying on any additional equipment.
arXiv Detail & Related papers (2020-05-09T16:11:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.