Beyond Average: Individualized Visual Scanpath Prediction
- URL: http://arxiv.org/abs/2404.12235v2
- Date: Fri, 19 Apr 2024 02:42:24 GMT
- Title: Beyond Average: Individualized Visual Scanpath Prediction
- Authors: Xianyu Chen, Ming Jiang, Qi Zhao,
- Abstract summary: individualized scanpath prediction (ISP) aims to accurately predict how different individuals shift their attention in diverse visual tasks.
ISP features an observer encoder to characterize and integrate an observer's unique attention traits, an observer-centric feature integration approach, and an adaptive fixation prioritization mechanism.
Our method is generally applicable to different datasets, model architectures, and visual tasks, offering a comprehensive tool for transforming general scanpath models into individualized ones.
- Score: 20.384132849805003
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding how attention varies across individuals has significant scientific and societal impacts. However, existing visual scanpath models treat attention uniformly, neglecting individual differences. To bridge this gap, this paper focuses on individualized scanpath prediction (ISP), a new attention modeling task that aims to accurately predict how different individuals shift their attention in diverse visual tasks. It proposes an ISP method featuring three novel technical components: (1) an observer encoder to characterize and integrate an observer's unique attention traits, (2) an observer-centric feature integration approach that holistically combines visual features, task guidance, and observer-specific characteristics, and (3) an adaptive fixation prioritization mechanism that refines scanpath predictions by dynamically prioritizing semantic feature maps based on individual observers' attention traits. These novel components allow scanpath models to effectively address the attention variations across different observers. Our method is generally applicable to different datasets, model architectures, and visual tasks, offering a comprehensive tool for transforming general scanpath models into individualized ones. Comprehensive evaluations using value-based and ranking-based metrics verify the method's effectiveness and generalizability.
Related papers
- GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths [20.384132849805003]
We introduce GazeXplain, a novel study of visual scanpath prediction and explanation.
This involves annotating natural-language explanations for fixations across eye-tracking datasets.
Experiments on diverse eye-tracking datasets demonstrate the effectiveness of GazeXplain in both scanpath prediction and explanation.
arXiv Detail & Related papers (2024-08-05T19:11:46Z) - Unified Dynamic Scanpath Predictors Outperform Individually Trained Neural Models [18.327960366321655]
We develop a deep learning-based social cue integration model for saliency prediction to predict scanpaths in videos.
We evaluate our approach on gaze of dynamic social scenes, observed under the free-viewing condition.
Results indicate that a single unified model, trained on all the observers' scanpaths, performs on par or better than individually trained models.
arXiv Detail & Related papers (2024-05-05T13:15:11Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Top-Down Visual Attention from Analysis by Synthesis [87.47527557366593]
We consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision.
We propose Analysis-by-Synthesis Vision Transformer (AbSViT), which is a top-down modulated ViT model that variationally approximates AbS, and controllable achieves top-down attention.
arXiv Detail & Related papers (2023-03-23T05:17:05Z) - An Inter-observer consistent deep adversarial training for visual
scanpath prediction [66.46953851227454]
We propose an inter-observer consistent adversarial training approach for scanpath prediction through a lightweight deep neural network.
We show the competitiveness of our approach in regard to state-of-the-art methods.
arXiv Detail & Related papers (2022-11-14T13:22:29Z) - Self-Attention Neural Bag-of-Features [103.70855797025689]
We build on the recently introduced 2D-Attention and reformulate the attention learning methodology.
We propose a joint feature-temporal attention mechanism that learns a joint 2D attention mask highlighting relevant information.
arXiv Detail & Related papers (2022-01-26T17:54:14Z) - Scanpath Prediction on Information Visualisations [19.591855190022667]
We propose a model that learns to predict visual saliency and scanpaths on information visualisations.
We present in-depth analyses of gaze behaviour for different information visualisation elements on the popular MASSVIS dataset.
arXiv Detail & Related papers (2021-12-04T13:59:52Z) - Alignment Attention by Matching Key and Query Distributions [48.93793773929006]
This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head.
It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention.
On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks.
arXiv Detail & Related papers (2021-10-25T00:54:57Z) - SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity.
Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism.
We propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design.
arXiv Detail & Related papers (2021-02-25T14:13:44Z) - Classifying Eye-Tracking Data Using Saliency Maps [8.524684315458245]
This paper proposes a visual saliency based novel feature extraction method for automatic and quantitative classification of eye-tracking data.
Comparing the saliency amplitudes, similarity and dissimilarity of saliency maps with the corresponding eye fixations maps gives an extra dimension of information which is effectively utilized to generate discriminative features to classify the eye-tracking data.
arXiv Detail & Related papers (2020-10-24T15:18:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.