Decoding Attention from Gaze: A Benchmark Dataset and End-to-End Models
- URL: http://arxiv.org/abs/2211.10966v1
- Date: Sun, 20 Nov 2022 12:24:57 GMT
- Title: Decoding Attention from Gaze: A Benchmark Dataset and End-to-End Models
- Authors: Karan Uppal, Jaeah Kim, Shashank Singh
- Abstract summary: Eye-tracking has potential to provide rich behavioral data about human cognition in ecologically valid environments.
This paper studies using computer vision tools for "attention decoding", the task of assessing the locus of a participant's overt visual attention over time.
- Score: 6.642042615005632
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Eye-tracking has potential to provide rich behavioral data about human
cognition in ecologically valid environments. However, analyzing this rich data
is often challenging. Most automated analyses are specific to simplistic
artificial visual stimuli with well-separated, static regions of interest,
while most analyses in the context of complex visual stimuli, such as most
natural scenes, rely on laborious and time-consuming manual annotation. This
paper studies using computer vision tools for "attention decoding", the task of
assessing the locus of a participant's overt visual attention over time. We
provide a publicly available Multiple Object Eye-Tracking (MOET) dataset,
consisting of gaze data from participants tracking specific objects, annotated
with labels and bounding boxes, in crowded real-world videos, for training and
evaluating attention decoding algorithms. We also propose two end-to-end deep
learning models for attention decoding and compare these to state-of-the-art
heuristic methods.
Related papers
- Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models [0.65268245109828]
We introduce the notion of contextual diversity for active learning CDAL.
We propose a data repair algorithm to curate contextually fair data to reduce model bias.
We are working on developing image retrieval system for wildlife camera trap images and reliable warning system for poor quality rural roads.
arXiv Detail & Related papers (2024-11-04T09:43:33Z) - I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data [4.487146086221174]
We present a novel human-centered learning algorithm designed for automated object recognition within mobile eye-tracking settings.
Our approach seamlessly integrates an object detector with a spatial relation-aware inductive message-passing network (I-MPN), harnessing node profile information and capturing object correlations.
arXiv Detail & Related papers (2024-06-10T13:08:31Z) - Semantic-Based Active Perception for Humanoid Visual Tasks with Foveal Sensors [49.99728312519117]
The aim of this work is to establish how accurately a recent semantic-based active perception model is able to complete visual tasks that are regularly performed by humans.
This model exploits the ability of current object detectors to localize and classify a large number of object classes and to update a semantic description of a scene across multiple fixations.
In the task of scene exploration, the semantic-based method demonstrates superior performance compared to the traditional saliency-based model.
arXiv Detail & Related papers (2024-04-16T18:15:57Z) - A Deep Learning Approach for the Segmentation of Electroencephalography
Data in Eye Tracking Applications [56.458448869572294]
We introduce DETRtime, a novel framework for time-series segmentation of EEG data.
Our end-to-end deep learning-based framework brings advances in Computer Vision to the forefront.
Our model generalizes well in the task of EEG sleep stage segmentation.
arXiv Detail & Related papers (2022-06-17T10:17:24Z) - Finding Facial Forgery Artifacts with Parts-Based Detectors [73.08584805913813]
We design a series of forgery detection systems that each focus on one individual part of the face.
We use these detectors to perform detailed empirical analysis on the FaceForensics++, Celeb-DF, and Facebook Deepfake Detection Challenge datasets.
arXiv Detail & Related papers (2021-09-21T16:18:45Z) - Visualization Techniques to Enhance Automated Event Extraction [0.0]
This case study seeks to identify potential triggers of state-led mass killings from news articles using NLP.
We demonstrate how visualizations can aid in each stage, from exploratory analysis of raw data, to machine learning training analysis, and finally post-inference validation.
arXiv Detail & Related papers (2021-06-11T19:24:54Z) - Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - Co-training for On-board Deep Object Detection [0.0]
Best performing deep vision-based object detectors are trained in a supervised manner by relying on human-labeled bounding boxes.
Co-training is a semi-supervised learning method for self-labeling objects in unlabeled images.
We show how co-training is a paradigm worth to pursue for alleviating object labeling, working both alone and together with task-agnostic domain adaptation.
arXiv Detail & Related papers (2020-08-12T19:08:59Z) - Spatio-Temporal Graph for Video Captioning with Knowledge Distillation [50.034189314258356]
We propose a graph model for video captioning that exploits object interactions in space and time.
Our model builds interpretable links and is able to provide explicit visual grounding.
To avoid correlations caused by the variable number of objects, we propose an object-aware knowledge distillation mechanism.
arXiv Detail & Related papers (2020-03-31T03:58:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.