AI-based Wearable Vision Assistance System for the Visually Impaired: Integrating Real-Time Object Recognition and Contextual Understanding Using Large Vision-Language Models
- URL: http://arxiv.org/abs/2412.20059v1
- Date: Sat, 28 Dec 2024 07:26:39 GMT
- Title: AI-based Wearable Vision Assistance System for the Visually Impaired: Integrating Real-Time Object Recognition and Contextual Understanding Using Large Vision-Language Models
- Authors: Mirza Samad Ahmed Baig, Syeda Anshrah Gillani, Shahid Munir Shah, Mahmoud Aljawarneh, Abdul Akbar Khan, Muhammad Hamzah Siddiqui,
- Abstract summary: This paper introduces a novel wearable vision assistance system with artificial intelligence (AI) technology to deliver real-time feedback to a user through a sound beep mechanism.
The system provides detailed descriptions of objects in the user's environment using a large vision language model (LVLM)
- Score: 0.0
- License:
- Abstract: Visual impairment affects the ability of people to live a life like normal people. Such people face challenges in performing activities of daily living, such as reading, writing, traveling and participating in social gatherings. Many traditional approaches are available to help visually impaired people; however, these are limited in obtaining contextually rich environmental information necessary for independent living. In order to overcome this limitation, this paper introduces a novel wearable vision assistance system that has a hat-mounted camera connected to a Raspberry Pi 4 Model B (8GB RAM) with artificial intelligence (AI) technology to deliver real-time feedback to a user through a sound beep mechanism. The key features of this system include a user-friendly procedure for the recognition of new people or objects through a one-click process that allows users to add data on new individuals and objects for later detection, enhancing the accuracy of the recognition over time. The system provides detailed descriptions of objects in the user's environment using a large vision language model (LVLM). In addition, it incorporates a distance sensor that activates a beeping sound using a buzzer as soon as the user is about to collide with an object, helping to ensure safety while navigating their environment. A comprehensive evaluation is carried out to evaluate the proposed AI-based solution against traditional support techniques. Comparative analysis shows that the proposed solution with its innovative combination of hardware and AI (including LVLMs with IoT), is a significant advancement in assistive technology that aims to solve the major issues faced by the community of visually impaired people
Related papers
- AIris: An AI-powered Wearable Assistive Device for the Visually Impaired [0.0]
We introduce AIris, an AI-powered wearable device that provides environmental awareness and interaction capabilities to visually impaired users.
We have created a functional prototype system that operates effectively in real-world conditions.
arXiv Detail & Related papers (2024-05-13T10:09:37Z) - MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting [97.52388851329667]
We introduce Marking Open-world Keypoint Affordances (MOKA) to solve robotic manipulation tasks specified by free-form language instructions.
Central to our approach is a compact point-based representation of affordance, which bridges the VLM's predictions on observed images and the robot's actions in the physical world.
We evaluate and analyze MOKA's performance on various table-top manipulation tasks including tool use, deformable body manipulation, and object rearrangement.
arXiv Detail & Related papers (2024-03-05T18:08:45Z) - GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual
AI for Smart Eyewear [30.71112461604336]
We introduce GazeGPT as a new user interaction paradigm for contextual AI.
GazeGPT uses eye tracking to help the LMM understand which object in the world-facing camera view a user is paying attention to.
We show that this gaze-contingent mechanism is a faster and more accurate pointing mechanism than alternatives.
arXiv Detail & Related papers (2024-01-30T18:02:44Z) - Floor extraction and door detection for visually impaired guidance [78.94595951597344]
Finding obstacle-free paths in unknown environments is a big navigation issue for visually impaired people and autonomous robots.
New devices based on computer vision systems can help impaired people to overcome the difficulties of navigating in unknown environments in safe conditions.
In this work it is proposed a combination of sensors and algorithms that can lead to the building of a navigation system for visually impaired people.
arXiv Detail & Related papers (2024-01-30T14:38:43Z) - Voila-A: Aligning Vision-Language Models with User's Gaze Attention [56.755993500556734]
We introduce gaze information as a proxy for human attention to guide Vision-Language Models (VLMs)
We propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications.
arXiv Detail & Related papers (2023-12-22T17:34:01Z) - Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic
Systems [67.01132165581667]
We propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components.
We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
arXiv Detail & Related papers (2023-11-13T21:20:17Z) - DRISHTI: Visual Navigation Assistant for Visually Impaired [0.0]
Blind and visually impaired (BVI) people face challenges because they need manual support to prompt information about their environment.
In this work, we took our first step towards developing an affordable and high-performing eye wearable assistive device, DRISHTI.
arXiv Detail & Related papers (2023-03-13T20:10:44Z) - ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement
Learning [91.58711082348293]
Reinforcement learning from online user feedback on the system's performance presents a natural solution to this problem.
This approach tends to require a large amount of human-in-the-loop training data, especially when feedback is sparse.
We propose a hierarchical solution that learns efficiently from sparse user feedback.
arXiv Detail & Related papers (2022-02-05T02:01:19Z) - VisBuddy -- A Smart Wearable Assistant for the Visually Challenged [0.0]
VisBuddy is a voice-based assistant, where the user can give voice commands to perform specific tasks.
It uses the techniques of image captioning for describing the user's surroundings, optical character recognition (OCR) for reading the text in the user's view, object detection to search and find the objects in a room and web scraping to give the user the latest news.
arXiv Detail & Related papers (2021-08-17T17:15:23Z) - AEGIS: A real-time multimodal augmented reality computer vision based
system to assist facial expression recognition for individuals with autism
spectrum disorder [93.0013343535411]
This paper presents the development of a multimodal augmented reality (AR) system which combines the use of computer vision and deep convolutional neural networks (CNN)
The proposed system, which we call AEGIS, is an assistive technology deployable on a variety of user devices including tablets, smartphones, video conference systems, or smartglasses.
We leverage both spatial and temporal information in order to provide an accurate expression prediction, which is then converted into its corresponding visualization and drawn on top of the original video frame.
arXiv Detail & Related papers (2020-10-22T17:20:38Z) - A Deep Learning based Wearable Healthcare IoT Device for AI-enabled
Hearing Assistance Automation [6.283190933140046]
This research presents a novel AI-enabled Internet of Things (IoT) device capable of assisting those who suffer from impairment of hearing or deafness to communicate with others in conversations.
A server application is created that leverages Google's online speech recognition service to convert the received conversations into texts, then deployed to a micro-display attached to the glasses to display the conversation contents to deaf people.
arXiv Detail & Related papers (2020-05-16T19:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.