Eye Gaze as a Signal for Conveying User Attention in Contextual AI Systems
- URL: http://arxiv.org/abs/2501.13878v1
- Date: Thu, 23 Jan 2025 17:51:54 GMT
- Title: Eye Gaze as a Signal for Conveying User Attention in Contextual AI Systems
- Authors: Ethan Wilson, Naveen Sendhilnathan, Charlie S. Burlingham, Yusuf Mansour, Robert Cavin, Sai Deep Tetali, Ajoy Savio Fernandes, Michael J. Proulx,
- Abstract summary: multimodal AI agents can now collaborate with users to solve challenges in the world.
We explore eye tracking's role in such interaction to convey a user's attention relative to the physical environment.
- Score: 6.910103624072253
- License:
- Abstract: Advanced multimodal AI agents can now collaborate with users to solve challenges in the world. We explore eye tracking's role in such interaction to convey a user's attention relative to the physical environment. We hypothesize that this knowledge improves contextual understanding for AI agents. By observing hours of human-object interactions, we first measure the relationship between an eye tracker's signal quality and its ability to reliably place gaze on nearby physical objects. We then conduct experiments which relay the user's scanpath history as additional context querying multimodal agents. Our results show that eye tracking provides high value as a user attention signal and can convey information about the user's current task and interests to the agent.
Related papers
- YETI (YET to Intervene) Proactive Interventions by Multimodal AI Agents in Augmented Reality Tasks [16.443149180969776]
Augmented Reality (AR) head worn devices can uniquely improve the user experience of solving procedural day-to-day tasks.
Such AR capabilities can help AI Agents see and listen to actions that users take which can relate to multimodal capabilities of human users.
Proactivity of AI Agents on the other hand can help the human user detect and correct any mistakes in agent observed tasks.
arXiv Detail & Related papers (2025-01-16T08:06:02Z) - Challenges in Human-Agent Communication [55.53932430345333]
We identify and analyze twelve key communication challenges that these systems pose.
These include challenges in conveying information from the agent to the user, challenges in enabling the user to convey information to the agent, and overarching challenges that need to be considered across all human-agent communication.
Our findings serve as an urgent call for new design patterns, principles, and guidelines to support transparency and control in these systems.
arXiv Detail & Related papers (2024-11-28T01:21:26Z) - I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data [4.487146086221174]
We present a novel human-centered learning algorithm designed for automated object recognition within mobile eye-tracking settings.
Our approach seamlessly integrates an object detector with a spatial relation-aware inductive message-passing network (I-MPN), harnessing node profile information and capturing object correlations.
arXiv Detail & Related papers (2024-06-10T13:08:31Z) - Modeling User Preferences via Brain-Computer Interfacing [54.3727087164445]
We use Brain-Computer Interfacing technology to infer users' preferences, their attentional correlates towards visual content, and their associations with affective experience.
We link these to relevant applications, such as information retrieval, personalized steering of generative models, and crowdsourcing population estimates of affective experiences.
arXiv Detail & Related papers (2024-05-15T20:41:46Z) - GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual
AI for Smart Eyewear [30.71112461604336]
We introduce GazeGPT as a new user interaction paradigm for contextual AI.
GazeGPT uses eye tracking to help the LMM understand which object in the world-facing camera view a user is paying attention to.
We show that this gaze-contingent mechanism is a faster and more accurate pointing mechanism than alternatives.
arXiv Detail & Related papers (2024-01-30T18:02:44Z) - Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data.
We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z) - Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models [56.257840490146]
ConCue is a novel approach for improving visual feature extraction in HOI detection.
We develop a transformer-based feature extraction module with a multi-tower architecture that integrates contextual cues into both instance and interaction detectors.
arXiv Detail & Related papers (2023-11-26T09:11:32Z) - What do navigation agents learn about their environment? [39.74076893981299]
We introduce the Interpretability System for Embodied agEnts (iSEE) for Point Goal and Object Goal navigation agents.
We use iSEE to probe the dynamic representations produced by these agents for the presence of information about the agent as well as the environment.
arXiv Detail & Related papers (2022-06-17T01:33:43Z) - Do Pedestrians Pay Attention? Eye Contact Detection in the Wild [75.54077277681353]
In urban environments, humans rely on eye contact for fast and efficient communication with nearby people.
In this paper, we focus on eye contact detection in the wild, i.e., real-world scenarios for autonomous vehicles with no control over the environment or the distance of pedestrians.
We introduce a model that leverages semantic keypoints to detect eye contact and show that this high-level representation achieves state-of-the-art results on the publicly-available dataset JAAD.
To study domain adaptation, we create LOOK: a large-scale dataset for eye contact detection in the wild, which focuses on diverse and un
arXiv Detail & Related papers (2021-12-08T10:21:28Z) - MutualEyeContact: A conversation analysis tool with focus on eye contact [69.17395873398196]
MutualEyeContact can help scientists to understand the importance of (mutual) eye contact in social interactions.
We combine state-of-the-art eye tracking with face recognition based on machine learning and provide a tool for analysis and visualization of social interaction sessions.
arXiv Detail & Related papers (2021-07-09T15:05:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.