Visual Attention Graph
- URL: http://arxiv.org/abs/2503.08531v1
- Date: Tue, 11 Mar 2025 15:22:44 GMT
- Title: Visual Attention Graph
- Authors: Kai-Fu Yang, Yong-Jie Li,
- Abstract summary: We propose a new attention representation, called Attention Graph, to simultaneously code the visual saliency and scanpath.<n>In the attention graph, the semantic-based scanpath is defined by the path on the graph, while saliency of objects can be obtained by computing fixation density on each node.
- Score: 21.860357478331107
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Visual attention plays a critical role when our visual system executes active visual tasks by interacting with the physical scene. However, how to encode the visual object relationship in the psychological world of our brain deserves to be explored. In the field of computer vision, predicting visual fixations or scanpaths is a usual way to explore the visual attention and behaviors of human observers when viewing a scene. Most existing methods encode visual attention using individual fixations or scanpaths based on the raw gaze shift data collected from human observers. This may not capture the common attention pattern well, because without considering the semantic information of the viewed scene, raw gaze shift data alone contain high inter- and intra-observer variability. To address this issue, we propose a new attention representation, called Attention Graph, to simultaneously code the visual saliency and scanpath in a graph-based representation and better reveal the common attention behavior of human observers. In the attention graph, the semantic-based scanpath is defined by the path on the graph, while saliency of objects can be obtained by computing fixation density on each node. Systemic experiments demonstrate that the proposed attention graph combined with our new evaluation metrics provides a better benchmark for evaluating attention prediction methods. Meanwhile, extra experiments demonstrate the promising potentials of the proposed attention graph in assessing human cognitive states, such as autism spectrum disorder screening and age classification.
Related papers
- GEM: Context-Aware Gaze EstiMation with Visual Search Behavior Matching for Chest Radiograph [32.1234295417225]
We propose a context-aware Gaze EstiMation (GEM) network that utilizes eye gaze data collected from radiologists to simulate their visual search behavior patterns.
It consists of a context-awareness module, visual behavior graph construction, and visual behavior matching.
Experiments on four publicly available datasets demonstrate the superiority of GEM over existing methods.
arXiv Detail & Related papers (2024-08-10T09:46:25Z) - GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths [20.384132849805003]
We introduce GazeXplain, a novel study of visual scanpath prediction and explanation.
This involves annotating natural-language explanations for fixations across eye-tracking datasets.
Experiments on diverse eye-tracking datasets demonstrate the effectiveness of GazeXplain in both scanpath prediction and explanation.
arXiv Detail & Related papers (2024-08-05T19:11:46Z) - OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction [0.2796197251957245]
This paper introduces the Object-level Attention Transformer (OAT)
OAT predicts human scanpaths as they search for a target object within a cluttered scene of distractors.
We evaluate OAT on the Amazon book cover dataset and a new dataset for visual search that we collected.
arXiv Detail & Related papers (2024-07-18T09:33:17Z) - Simulating Human Gaze with Neural Visual Attention [44.65733084492857]
We propose the Neural Visual Attention (NeVA) algorithm to integrate guidance of any downstream visual task into attention modeling.
We observe that biologically constrained neural networks generate human-like scanpaths without being trained for this objective.
arXiv Detail & Related papers (2022-11-22T09:02:09Z) - An Inter-observer consistent deep adversarial training for visual
scanpath prediction [66.46953851227454]
We propose an inter-observer consistent adversarial training approach for scanpath prediction through a lightweight deep neural network.
We show the competitiveness of our approach in regard to state-of-the-art methods.
arXiv Detail & Related papers (2022-11-14T13:22:29Z) - Behind the Machine's Gaze: Biologically Constrained Neural Networks
Exhibit Human-like Visual Attention [40.878963450471026]
We propose the Neural Visual Attention (NeVA) algorithm to generate visual scanpaths in a top-down manner.
We show that the proposed method outperforms state-of-the-art unsupervised human attention models in terms of similarity to human scanpaths.
arXiv Detail & Related papers (2022-04-19T18:57:47Z) - Leveraging Human Selective Attention for Medical Image Analysis with
Limited Training Data [72.1187887376849]
The selective attention mechanism helps the cognition system focus on task-relevant visual clues by ignoring the presence of distractors.
We propose a framework to leverage gaze for medical image analysis tasks with small training data.
Our method is demonstrated to achieve superior performance on both 3D tumor segmentation and 2D chest X-ray classification tasks.
arXiv Detail & Related papers (2021-12-02T07:55:25Z) - Attention Mechanisms in Computer Vision: A Survey [75.6074182122423]
We provide a comprehensive review of various attention mechanisms in computer vision.
We categorize them according to approach, such as channel attention, spatial attention, temporal attention and branch attention.
We suggest future directions for attention mechanism research.
arXiv Detail & Related papers (2021-11-15T09:18:40Z) - Counterfactual Attention Learning for Fine-Grained Visual Categorization
and Re-identification [101.49122450005869]
We present a counterfactual attention learning method to learn more effective attention based on causal inference.
Specifically, we analyze the effect of the learned visual attention on network prediction.
We evaluate our method on a wide range of fine-grained recognition tasks.
arXiv Detail & Related papers (2021-08-19T14:53:40Z) - Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.