Related papers: AIris: An AI-powered Wearable Assistive Device for the Visually Impaired

Related papers

Probing the Gaps in ChatGPT Live Video Chat for Real-World Assistance for People who are Blind or Visually Impaired [10.648018999640758]
We present findings from an exploratory study with eight blind or visually impaired (BVI) participants.<n>Our findings indicate that current live video AI effectively provides guidance and answers for static visual scenes but falls short in delivering essential live descriptions required in dynamic situations.<n>We discuss implications for assistive video AI agents, including incorporating additional sensing capabilities for real-world use.
arXiv Detail & Related papers (2025-08-05T16:59:02Z)
Embodied AI Agents: Modeling the World [188.85697524284834]
This paper describes our research on AI agents embodied in visual, virtual or physical forms.<n>We propose that the development of world models is central to reasoning and planning of embodied AI agents.<n>We also propose to learn the mental world model of users to enable better human-agent collaboration.
arXiv Detail & Related papers (2025-06-27T16:05:34Z)
Casper: Inferring Diverse Intents for Assistive Teleoperation with Vision Language Models [50.19518681574399]
A central challenge in real-world assistive teleoperation is for the robot to infer a wide range of human intentions from user control inputs.<n>We introduce Casper, an assistive teleoperation system that leverages commonsense knowledge embedded in pre-trained visual language models.<n>We show that Casper improves task performance, reduces human cognitive load, and achieves higher user satisfaction than direct teleoperation and assistive teleoperation baselines.
arXiv Detail & Related papers (2025-06-17T17:06:43Z)
Emergent Active Perception and Dexterity of Simulated Humanoids from Visual Reinforcement Learning [69.71072181304066]
We introduce Perceptive Dexterous Control (PDC), a framework for vision-driven whole-body control with simulated humanoids.<n>PDC operates solely on egocentric vision for task specification, enabling object search, target placement, and skill selection through visual cues.<n>We show that training from scratch with reinforcement learning can produce emergent behaviors such as active search.
arXiv Detail & Related papers (2025-05-18T07:33:31Z)
A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations [0.9580312063277943]
Recent developments in Artificial Intelligence (AI) and Machine Learning (ML) are creating new opportunities for Human-Autonomy Teaming (HAT) We present a real-time Human Digital Twin (HDT) architecture that integrates Large Language Models (LLMs) for knowledge reporting, answering, and recommendation, embodied in a visual interface. The HDT acts as a visually and behaviorally realistic team member, integrated throughout the mission lifecycle, from training to deployment to after-action review.
arXiv Detail & Related papers (2025-04-04T03:56:26Z)
Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering [13.775516653315103]
Social intelligence is essential for effective communication and adaptive responses. Current video-based methods for social intelligence rely on general video recognition or emotion recognition techniques. We propose the Looped Video Debating framework, which integrates Large Language Models with visual information.
arXiv Detail & Related papers (2025-03-27T06:14:21Z)
AI-based Wearable Vision Assistance System for the Visually Impaired: Integrating Real-Time Object Recognition and Contextual Understanding Using Large Vision-Language Models [0.0]
This paper introduces a novel wearable vision assistance system with artificial intelligence (AI) technology to deliver real-time feedback to a user through a sound beep mechanism. The system provides detailed descriptions of objects in the user's environment using a large vision language model (LVLM)
arXiv Detail & Related papers (2024-12-28T07:26:39Z)
The AI Interface: Designing for the Ideal Machine-Human Experience (Editorial) [1.8074330674710588]
This editorial introduces a Special Issue that explores the psychology of AI experience design. Papers in this collection highlight the complexities of trust, transparency, and emotional sensitivity in human-AI interaction. By findings from eight diverse studies, this editorial underscores the need for AI interfaces to balance efficiency with empathy.
arXiv Detail & Related papers (2024-11-29T15:17:32Z)
Heads Up eXperience (HUX): Always-On AI Companion for Human Computer Environment Interaction [0.5825410941577593]
Heads Up eXperience (HUX) is an AI system designed to bridge the gap between digital and human environments. By tracking the user's eye gaze, analyzing the surrounding environment, and interpreting verbal contexts, the system captures and enhances multi-modal data. Intended for deployment in smart glasses and extended reality headsets, HUX AI aims to become a personal and useful AI companion for daily life.
arXiv Detail & Related papers (2024-07-28T13:15:51Z)
GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual AI for Smart Eyewear [30.71112461604336]
We introduce GazeGPT as a new user interaction paradigm for contextual AI. GazeGPT uses eye tracking to help the LMM understand which object in the world-facing camera view a user is paying attention to. We show that this gaze-contingent mechanism is a faster and more accurate pointing mechanism than alternatives.
arXiv Detail & Related papers (2024-01-30T18:02:44Z)
Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data. We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z)
Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing [15.970078821894758]
We introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation. Robot Synesthesia is a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia.
arXiv Detail & Related papers (2023-12-04T12:35:43Z)
Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic Systems [67.01132165581667]
We propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components. We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
arXiv Detail & Related papers (2023-11-13T21:20:17Z)
Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks. We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders. Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z)
See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation [49.925499720323806]
We study how visual, auditory, and tactile perception can jointly help robots to solve complex manipulation tasks. We build a robot system that can see with a camera, hear with a contact microphone, and feel with a vision-based tactile sensor.
arXiv Detail & Related papers (2022-12-07T18:55:53Z)
The Gesture Authoring Space: Authoring Customised Hand Gestures for Grasping Virtual Objects in Immersive Virtual Environments [81.5101473684021]
This work proposes a hand gesture authoring tool for object specific grab gestures allowing virtual objects to be grabbed as in the real world. The presented solution uses template matching for gesture recognition and requires no technical knowledge to design and create custom tailored hand gestures. The study showed that gestures created with the proposed approach are perceived by users as a more natural input modality than the others.
arXiv Detail & Related papers (2022-07-03T18:33:33Z)
AEGIS: A real-time multimodal augmented reality computer vision based system to assist facial expression recognition for individuals with autism spectrum disorder [93.0013343535411]
This paper presents the development of a multimodal augmented reality (AR) system which combines the use of computer vision and deep convolutional neural networks (CNN) The proposed system, which we call AEGIS, is an assistive technology deployable on a variety of user devices including tablets, smartphones, video conference systems, or smartglasses. We leverage both spatial and temporal information in order to provide an accurate expression prediction, which is then converted into its corresponding visualization and drawn on top of the original video frame.
arXiv Detail & Related papers (2020-10-22T17:20:38Z)
A Deep Learning based Wearable Healthcare IoT Device for AI-enabled Hearing Assistance Automation [6.283190933140046]
This research presents a novel AI-enabled Internet of Things (IoT) device capable of assisting those who suffer from impairment of hearing or deafness to communicate with others in conversations. A server application is created that leverages Google's online speech recognition service to convert the received conversations into texts, then deployed to a micro-display attached to the glasses to display the conversation contents to deaf people.
arXiv Detail & Related papers (2020-05-16T19:42:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.