Related papers: Decoding Attention from Gaze: A Benchmark Dataset and End-to-End Models

Decoding Attention from Gaze: A Benchmark Dataset and End-to-End Models

URL: http://arxiv.org/abs/2211.10966v1
Date: Sun, 20 Nov 2022 12:24:57 GMT
Title: Decoding Attention from Gaze: A Benchmark Dataset and End-to-End Models
Authors: Karan Uppal, Jaeah Kim, Shashank Singh
Abstract summary: Eye-tracking has potential to provide rich behavioral data about human cognition in ecologically valid environments. This paper studies using computer vision tools for "attention decoding", the task of assessing the locus of a participant's overt visual attention over time.
Score: 6.642042615005632
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Eye-tracking has potential to provide rich behavioral data about human cognition in ecologically valid environments. However, analyzing this rich data is often challenging. Most automated analyses are specific to simplistic artificial visual stimuli with well-separated, static regions of interest, while most analyses in the context of complex visual stimuli, such as most natural scenes, rely on laborious and time-consuming manual annotation. This paper studies using computer vision tools for "attention decoding", the task of assessing the locus of a participant's overt visual attention over time. We provide a publicly available Multiple Object Eye-Tracking (MOET) dataset, consisting of gaze data from participants tracking specific objects, annotated with labels and bounding boxes, in crowded real-world videos, for training and evaluating attention decoding algorithms. We also propose two end-to-end deep learning models for attention decoding and compare these to state-of-the-art heuristic methods.

Related papers

Eyes on Target: Gaze-Aware Object Detection in Egocentric Video [1.3320917259299652]
We propose Eyes on Target, a novel depth-aware and gaze-guided object detection framework for egocentric videos.<n>Our approach injects gaze-derived features into the attention mechanism of a Vision Transformer (ViT), effectively biasing spatial feature selection toward human-attended regions.<n>We validate our method on an egocentric simulator dataset where human visual attention is critical for task assessment.
arXiv Detail & Related papers (2025-11-03T05:21:58Z)
From Sight to Insight: Unleashing Eye-Tracking in Weakly Supervised Video Salient Object Detection [60.11169426478452]
This paper aims to introduce fixation information to assist the detection of salient objects under weak supervision.<n>We propose a Position and Semantic Embedding (PSE) module to provide location and semantic guidance during the feature learning process.<n>An Intra-Inter Mixed Contrastive (MCII) model improves thetemporal modeling capabilities under weak supervision.
arXiv Detail & Related papers (2025-06-30T05:01:40Z)
Gaze2AOI: Open Source Deep-learning Based System for Automatic Area of Interest Annotation with Eye Tracking Data [0.0]
We present a novel method to enhance the analysis of user behaviour and attention by augmenting video streams with automatically annotating and labelling areas of interest (AOIs) The tool provides key features such as time to first fixation, dwell time, and frequency of AOI revisits.
arXiv Detail & Related papers (2024-11-20T14:17:23Z)
Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models [0.65268245109828]
We introduce the notion of contextual diversity for active learning CDAL. We propose a data repair algorithm to curate contextually fair data to reduce model bias. We are working on developing image retrieval system for wildlife camera trap images and reliable warning system for poor quality rural roads.
arXiv Detail & Related papers (2024-11-04T09:43:33Z)
I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data [4.487146086221174]
We present a novel human-centered learning algorithm designed for automated object recognition within mobile eye-tracking settings. Our approach seamlessly integrates an object detector with a spatial relation-aware inductive message-passing network (I-MPN), harnessing node profile information and capturing object correlations.
arXiv Detail & Related papers (2024-06-10T13:08:31Z)
Semantic-Based Active Perception for Humanoid Visual Tasks with Foveal Sensors [49.99728312519117]
The aim of this work is to establish how accurately a recent semantic-based active perception model is able to complete visual tasks that are regularly performed by humans. This model exploits the ability of current object detectors to localize and classify a large number of object classes and to update a semantic description of a scene across multiple fixations. In the task of scene exploration, the semantic-based method demonstrates superior performance compared to the traditional saliency-based model.
arXiv Detail & Related papers (2024-04-16T18:15:57Z)
A Deep Learning Approach for the Segmentation of Electroencephalography Data in Eye Tracking Applications [56.458448869572294]
We introduce DETRtime, a novel framework for time-series segmentation of EEG data. Our end-to-end deep learning-based framework brings advances in Computer Vision to the forefront. Our model generalizes well in the task of EEG sleep stage segmentation.
arXiv Detail & Related papers (2022-06-17T10:17:24Z)
Finding Facial Forgery Artifacts with Parts-Based Detectors [73.08584805913813]
We design a series of forgery detection systems that each focus on one individual part of the face. We use these detectors to perform detailed empirical analysis on the FaceForensics++, Celeb-DF, and Facebook Deepfake Detection Challenge datasets.
arXiv Detail & Related papers (2021-09-21T16:18:45Z)
REGRAD: A Large-Scale Relational Grasp Dataset for Safe and Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps. Our dataset is collected in both forms of 2D images and 3D point clouds. Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z)
Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation. We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data. Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z)
What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations. Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z)
Co-training for On-board Deep Object Detection [0.0]
Best performing deep vision-based object detectors are trained in a supervised manner by relying on human-labeled bounding boxes. Co-training is a semi-supervised learning method for self-labeling objects in unlabeled images. We show how co-training is a paradigm worth to pursue for alleviating object labeling, working both alone and together with task-agnostic domain adaptation.
arXiv Detail & Related papers (2020-08-12T19:08:59Z)
Spatio-Temporal Graph for Video Captioning with Knowledge Distillation [50.034189314258356]
We propose a graph model for video captioning that exploits object interactions in space and time. Our model builds interpretable links and is able to provide explicit visual grounding. To avoid correlations caused by the variable number of objects, we propose an object-aware knowledge distillation mechanism.
arXiv Detail & Related papers (2020-03-31T03:58:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.