GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual
AI for Smart Eyewear
- URL: http://arxiv.org/abs/2401.17217v2
- Date: Wed, 31 Jan 2024 05:21:13 GMT
- Title: GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual
AI for Smart Eyewear
- Authors: Robert Konrad, Nitish Padmanaban, J. Gabriel Buckmaster, Kevin C.
Boyle, Gordon Wetzstein
- Abstract summary: We introduce GazeGPT as a new user interaction paradigm for contextual AI.
GazeGPT uses eye tracking to help the LMM understand which object in the world-facing camera view a user is paying attention to.
We show that this gaze-contingent mechanism is a faster and more accurate pointing mechanism than alternatives.
- Score: 30.71112461604336
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multimodal large language models (LMMs) excel in world knowledge and
problem-solving abilities. Through the use of a world-facing camera and
contextual AI, emerging smart accessories aim to provide a seamless interface
between humans and LMMs. Yet, these wearable computing systems lack an
understanding of the user's attention. We introduce GazeGPT as a new user
interaction paradigm for contextual AI. GazeGPT uses eye tracking to help the
LMM understand which object in the world-facing camera view a user is paying
attention to. Using extensive user evaluations, we show that this
gaze-contingent mechanism is a faster and more accurate pointing mechanism than
alternatives; that it augments human capabilities by significantly improving
their accuracy in a dog-breed classification task; and that it is consistently
ranked as more natural than head- or body-driven selection mechanisms for
contextual AI. Moreover, we prototype a variety of application scenarios that
suggest GazeGPT could be of significant value to users as part of future
AI-driven personal assistants.
Related papers
- Heads Up eXperience (HUX): Always-On AI Companion for Human Computer Environment Interaction [0.5825410941577593]
Heads Up eXperience (HUX) is an AI system designed to bridge the gap between digital and human environments.
By tracking the user's eye gaze, analyzing the surrounding environment, and interpreting verbal contexts, the system captures and enhances multi-modal data.
Intended for deployment in smart glasses and extended reality headsets, HUX AI aims to become a personal and useful AI companion for daily life.
arXiv Detail & Related papers (2024-07-28T13:15:51Z) - AIris: An AI-powered Wearable Assistive Device for the Visually Impaired [0.0]
We introduce AIris, an AI-powered wearable device that provides environmental awareness and interaction capabilities to visually impaired users.
We have created a functional prototype system that operates effectively in real-world conditions.
arXiv Detail & Related papers (2024-05-13T10:09:37Z) - MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting [97.52388851329667]
We introduce Marking Open-world Keypoint Affordances (MOKA) to solve robotic manipulation tasks specified by free-form language instructions.
Central to our approach is a compact point-based representation of affordance, which bridges the VLM's predictions on observed images and the robot's actions in the physical world.
We evaluate and analyze MOKA's performance on various table-top manipulation tasks including tool use, deformable body manipulation, and object rearrangement.
arXiv Detail & Related papers (2024-03-05T18:08:45Z) - CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI Automation [61.68049335444254]
Multimodal large language models (MLLMs) have shown remarkable potential as human-like autonomous language agents to interact with real-world environments.
We propose a Comprehensive Cognitive LLM Agent, CoCo-Agent, with two novel approaches, comprehensive environment perception (CEP) and conditional action prediction (CAP)
With our technical design, our agent achieves new state-of-the-art performance on AITW and META-GUI benchmarks, showing promising abilities in realistic scenarios.
arXiv Detail & Related papers (2024-02-19T08:29:03Z) - Voila-A: Aligning Vision-Language Models with User's Gaze Attention [56.755993500556734]
We introduce gaze information as a proxy for human attention to guide Vision-Language Models (VLMs)
We propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications.
arXiv Detail & Related papers (2023-12-22T17:34:01Z) - Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic
Systems [67.01132165581667]
We propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components.
We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
arXiv Detail & Related papers (2023-11-13T21:20:17Z) - Large Language Models Empowered Autonomous Edge AI for Connected
Intelligence [51.269276328087855]
Edge artificial intelligence (Edge AI) is a promising solution to achieve connected intelligence.
This article presents a vision of autonomous edge AI systems that automatically organize, adapt, and optimize themselves to meet users' diverse requirements.
arXiv Detail & Related papers (2023-07-06T05:16:55Z) - DetGPT: Detect What You Need via Reasoning [33.00345609506097]
We introduce a new paradigm for object detection that we call reasoning-based object detection.
Unlike conventional object detection methods that rely on specific object names, our approach enables users to interact with the system using natural language instructions.
Our proposed method, called DetGPT, leverages state-of-the-art multi-modal models and open-vocabulary object detectors.
arXiv Detail & Related papers (2023-05-23T15:37:28Z) - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging
Face [85.25054021362232]
Large language models (LLMs) have exhibited exceptional abilities in language understanding, generation, interaction, and reasoning.
LLMs could act as a controller to manage existing AI models to solve complicated AI tasks.
We present HuggingGPT, an LLM-powered agent that connects various AI models in machine learning communities.
arXiv Detail & Related papers (2023-03-30T17:48:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.