Related papers: Eye Gaze as a Signal for Conveying User Attention in Contextual AI Systems

Eye Gaze as a Signal for Conveying User Attention in Contextual AI Systems

URL: http://arxiv.org/abs/2501.13878v3
Date: Sat, 12 Apr 2025 15:42:34 GMT
Title: Eye Gaze as a Signal for Conveying User Attention in Contextual AI Systems
Authors: Ethan Wilson, Naveen Sendhilnathan, Charlie S. Burlingham, Yusuf Mansour, Robert Cavin, Sai Deep Tetali, Ajoy Savio Fernandes, Michael J. Proulx,
Abstract summary: multimodal AI systems rely on explicit communication channels between the user and system.<n>We explore the potential of wearable eye tracking to convey signals about user attention.
Score: 6.910103624072253
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Advanced multimodal AI agents can now collaborate with users to solve challenges in the world. Yet, these emerging contextual AI systems rely on explicit communication channels between the user and system. We hypothesize that implicit communication of the user's interests and intent would reduce friction and improve user experience when collaborating with AI agents. In this work, we explore the potential of wearable eye tracking to convey signals about user attention. We measure the eye tracking signal quality requirements to effectively map gaze traces to physical objects, then conduct experiments that provide visual scanpath history as additional context when querying vision language models. Our results show that eye tracking provides high value as a user attention signal and can convey important context about the user's current task and interests, improving understanding of contextual AI agents.

Related papers

Probing the Gaps in ChatGPT Live Video Chat for Real-World Assistance for People who are Blind or Visually Impaired [10.648018999640758]
We present findings from an exploratory study with eight blind or visually impaired (BVI) participants.<n>Our findings indicate that current live video AI effectively provides guidance and answers for static visual scenes but falls short in delivering essential live descriptions required in dynamic situations.<n>We discuss implications for assistive video AI agents, including incorporating additional sensing capabilities for real-world use.
arXiv Detail & Related papers (2025-08-05T16:59:02Z)
YETI (YET to Intervene) Proactive Interventions by Multimodal AI Agents in Augmented Reality Tasks [16.443149180969776]
Augmented Reality (AR) head worn devices can uniquely improve the user experience of solving procedural day-to-day tasks.<n>Such AR capabilities can help AI Agents see and listen to actions that users take which can relate to multimodal capabilities of human users.<n>Proactivity of AI Agents on the other hand can help the human user detect and correct any mistakes in agent observed tasks.
arXiv Detail & Related papers (2025-01-16T08:06:02Z)
AI-based Wearable Vision Assistance System for the Visually Impaired: Integrating Real-Time Object Recognition and Contextual Understanding Using Large Vision-Language Models [0.0]
This paper introduces a novel wearable vision assistance system with artificial intelligence (AI) technology to deliver real-time feedback to a user through a sound beep mechanism. The system provides detailed descriptions of objects in the user's environment using a large vision language model (LVLM)
arXiv Detail & Related papers (2024-12-28T07:26:39Z)
Challenges in Human-Agent Communication [55.53932430345333]
We identify and analyze twelve key communication challenges that these systems pose. These include challenges in conveying information from the agent to the user, challenges in enabling the user to convey information to the agent, and overarching challenges that need to be considered across all human-agent communication. Our findings serve as an urgent call for new design patterns, principles, and guidelines to support transparency and control in these systems.
arXiv Detail & Related papers (2024-11-28T01:21:26Z)
I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data [4.487146086221174]
We present a novel human-centered learning algorithm designed for automated object recognition within mobile eye-tracking settings. Our approach seamlessly integrates an object detector with a spatial relation-aware inductive message-passing network (I-MPN), harnessing node profile information and capturing object correlations.
arXiv Detail & Related papers (2024-06-10T13:08:31Z)
Modeling User Preferences via Brain-Computer Interfacing [54.3727087164445]
We use Brain-Computer Interfacing technology to infer users' preferences, their attentional correlates towards visual content, and their associations with affective experience. We link these to relevant applications, such as information retrieval, personalized steering of generative models, and crowdsourcing population estimates of affective experiences.
arXiv Detail & Related papers (2024-05-15T20:41:46Z)
AIris: An AI-powered Wearable Assistive Device for the Visually Impaired [0.0]
We introduce AIris, an AI-powered wearable device that provides environmental awareness and interaction capabilities to visually impaired users. We have created a functional prototype system that operates effectively in real-world conditions.
arXiv Detail & Related papers (2024-05-13T10:09:37Z)
GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual AI for Smart Eyewear [30.71112461604336]
We introduce GazeGPT as a new user interaction paradigm for contextual AI. GazeGPT uses eye tracking to help the LMM understand which object in the world-facing camera view a user is paying attention to. We show that this gaze-contingent mechanism is a faster and more accurate pointing mechanism than alternatives.
arXiv Detail & Related papers (2024-01-30T18:02:44Z)
Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data. We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z)
Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models [56.257840490146]
ConCue is a novel approach for improving visual feature extraction in HOI detection. We develop a transformer-based feature extraction module with a multi-tower architecture that integrates contextual cues into both instance and interaction detectors.
arXiv Detail & Related papers (2023-11-26T09:11:32Z)
Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic Systems [67.01132165581667]
We propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components. We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
arXiv Detail & Related papers (2023-11-13T21:20:17Z)
Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection [57.13665112065285]
Human-Object Interaction (HOI) detection is a challenging computer vision task. We present a framework that enhances HOI detection by incorporating structured text knowledge.
arXiv Detail & Related papers (2023-07-25T14:20:52Z)
What do navigation agents learn about their environment? [39.74076893981299]
We introduce the Interpretability System for Embodied agEnts (iSEE) for Point Goal and Object Goal navigation agents. We use iSEE to probe the dynamic representations produced by these agents for the presence of information about the agent as well as the environment.
arXiv Detail & Related papers (2022-06-17T01:33:43Z)
Do Pedestrians Pay Attention? Eye Contact Detection in the Wild [75.54077277681353]
In urban environments, humans rely on eye contact for fast and efficient communication with nearby people. In this paper, we focus on eye contact detection in the wild, i.e., real-world scenarios for autonomous vehicles with no control over the environment or the distance of pedestrians. We introduce a model that leverages semantic keypoints to detect eye contact and show that this high-level representation achieves state-of-the-art results on the publicly-available dataset JAAD. To study domain adaptation, we create LOOK: a large-scale dataset for eye contact detection in the wild, which focuses on diverse and un
arXiv Detail & Related papers (2021-12-08T10:21:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.