Related papers: STARE: Predicting Decision Making Based on Spatio-Temporal Eye Movements

STARE: Predicting Decision Making Based on Spatio-Temporal Eye Movements

URL: http://arxiv.org/abs/2508.04148v1
Date: Wed, 06 Aug 2025 07:20:31 GMT
Title: STARE: Predicting Decision Making Based on Spatio-Temporal Eye Movements
Authors: Moshe Unger, Alexander Tuzhilin, Michel Wedel,
Abstract summary: The present proposes a Deep Learning architecture for the prediction of various consumer choice behaviors from time series of raw gaze or eye fixations on images of the decision environment.<n>We compare STARE with several state-of-the art alternatives on multiple datasets with the purpose of predicting consumer choice behaviors from eye movements.
Score: 49.906485205551746
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The present work proposes a Deep Learning architecture for the prediction of various consumer choice behaviors from time series of raw gaze or eye fixations on images of the decision environment, for which currently no foundational models are available. The architecture, called STARE (Spatio-Temporal Attention Representation for Eye Tracking), uses a new tokenization strategy, which involves mapping the x- and y- pixel coordinates of eye-movement time series on predefined, contiguous Regions of Interest. That tokenization makes the spatio-temporal eye-movement data available to the Chronos, a time-series foundation model based on the T5 architecture, to which co-attention and/or cross-attention is added to capture directional and/or interocular influences of eye movements. We compare STARE with several state-of-the art alternatives on multiple datasets with the purpose of predicting consumer choice behaviors from eye movements. We thus make a first step towards developing and testing DL architectures that represent visual attention dynamics rooted in the neurophysiology of eye movements.

Related papers

Probing Fine-Grained Action Understanding and Cross-View Generalization of Foundation Models [13.972809192907931]
Foundation models (FMs) are large neural networks trained on broad datasets. Human activity recognition in video has advanced with FMs, driven by competition among different architectures. This paper empirically evaluates how perspective changes affect different FMs in fine-grained human activity recognition.
arXiv Detail & Related papers (2024-07-22T12:59:57Z)
CLERA: A Unified Model for Joint Cognitive Load and Eye Region Analysis in the Wild [18.79132232751083]
Real-time analysis of the dynamics of the eye region allows us to monitor humans' visual attention allocation and estimate their mental state. We propose CLERA, which achieves precise keypoint detection andtemporal tracking in a joint-learning framework. We also introduce a large-scale dataset of 30k human faces with joint pupil, eye-openness, and landmark annotation.
arXiv Detail & Related papers (2023-06-26T21:20:23Z)
TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction [64.63645677568384]
We introduce a novel saliency prediction model that learns to output saliency maps in sequential time intervals. Our approach locally modulates the saliency predictions by combining the learned temporal maps. Our code will be publicly available on GitHub.
arXiv Detail & Related papers (2023-01-05T22:10:16Z)
A Deep Learning Approach for the Segmentation of Electroencephalography Data in Eye Tracking Applications [56.458448869572294]
We introduce DETRtime, a novel framework for time-series segmentation of EEG data. Our end-to-end deep learning-based framework brings advances in Computer Vision to the forefront. Our model generalizes well in the task of EEG sleep stage segmentation.
arXiv Detail & Related papers (2022-06-17T10:17:24Z)
Individual Topology Structure of Eye Movement Trajectories [6.09170287691728]
We propose an application of a new class of features to the quantitative analysis of personal eye movement trajectories structure. We experimentally demonstrate the competitiveness of the new class of features with the traditional ones and their significant synergy.
arXiv Detail & Related papers (2022-05-21T20:30:45Z)
Revisiting spatio-temporal layouts for compositional action recognition [63.04778884595353]
We take an object-centric approach to action recognition. The main focus of this paper is compositional/few-shot action recognition. We demonstrate how to improve the performance of appearance-based models by fusion with layout-based models.
arXiv Detail & Related papers (2021-11-02T23:04:39Z)
Bayesian Eye Tracking [63.21413628808946]
Model-based eye tracking is susceptible to eye feature detection errors. We propose a Bayesian framework for model-based eye tracking. Compared to state-of-the-art model-based and learning-based methods, the proposed framework demonstrates significant improvement in generalization capability.
arXiv Detail & Related papers (2021-06-25T02:08:03Z)
Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation. CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body. It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z)
Spatio-Temporal Analysis of Facial Actions using Lifecycle-Aware Capsule Networks [12.552355581481994]
AULA-Caps learns between contiguous frames by focusing on relevant temporal-temporal segments in the sequence. The learnt feature capsules are routed together such that the model learns to selectively focus on spatial ortemporal information depending upon the AU lifecycle. The proposed model is evaluated on the commonly used BP4D and GFT benchmark datasets.
arXiv Detail & Related papers (2020-11-17T18:36:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.