GASP: Gated Attention For Saliency Prediction
- URL: http://arxiv.org/abs/2206.04590v1
- Date: Thu, 9 Jun 2022 16:14:09 GMT
- Title: GASP: Gated Attention For Saliency Prediction
- Authors: Fares Abawi, Tom Weber and Stefan Wermter
- Abstract summary: We present a neural model for integrating social cues and weighting their influences.
We show that gaze direction and affective representations contribute a prediction to ground-truth improvement of at least 5% compared to dynamic saliency models.
- Score: 18.963277212703005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Saliency prediction refers to the computational task of modeling overt
attention. Social cues greatly influence our attention, consequently altering
our eye movements and behavior. To emphasize the efficacy of such features, we
present a neural model for integrating social cues and weighting their
influences. Our model consists of two stages. During the first stage, we detect
two social cues by following gaze, estimating gaze direction, and recognizing
affect. These features are then transformed into spatiotemporal maps through
image processing operations. The transformed representations are propagated to
the second stage (GASP) where we explore various techniques of late fusion for
integrating social cues and introduce two sub-networks for directing attention
to relevant stimuli. Our experiments indicate that fusion approaches achieve
better results for static integration methods, whereas non-fusion approaches
for which the influence of each modality is unknown, result in better outcomes
when coupled with recurrent models for dynamic saliency prediction. We show
that gaze direction and affective representations contribute a prediction to
ground-truth correspondence improvement of at least 5% compared to dynamic
saliency models without social cues. Furthermore, affective representations
improve GASP, supporting the necessity of considering affect-biased attention
in predicting saliency.
Related papers
- Human Scanpath Prediction in Target-Present Visual Search with Semantic-Foveal Bayesian Attention [49.99728312519117]
SemBA-FAST is a top-down framework designed for predicting human visual attention in target-present visual search.<n>We evaluate SemBA-FAST on the COCO-Search18 benchmark dataset, comparing its performance against other scanpath prediction models.<n>These findings provide valuable insights into the capabilities of semantic-foveal probabilistic frameworks for human-like attention modelling.
arXiv Detail & Related papers (2025-07-24T15:19:23Z) - Dynamic Programming Techniques for Enhancing Cognitive Representation in Knowledge Tracing [125.75923987618977]
We propose the Cognitive Representation Dynamic Programming based Knowledge Tracing (CRDP-KT) model.<n>It is a dynamic programming algorithm to optimize cognitive representations based on the difficulty of the questions and the performance intervals between them.<n>It provides more accurate and systematic input features for subsequent model training, thereby minimizing distortion in the simulation of cognitive states.
arXiv Detail & Related papers (2025-06-03T14:44:48Z) - An Active Inference Model of Covert and Overt Visual Attention [0.0]
This paper introduces a model of covert and overt visual attention through the framework of active inference.<n>The model determines visual sensory precisions based on both current environmental beliefs and sensory input.
arXiv Detail & Related papers (2025-05-06T09:26:00Z) - A Meaningful Perturbation Metric for Evaluating Explainability Methods [55.09730499143998]
We introduce a novel approach, which harnesses image generation models to perform targeted perturbation.
Specifically, we focus on inpainting only the high-relevance pixels of an input image to modify the model's predictions while preserving image fidelity.
This is in contrast to existing approaches, which often produce out-of-distribution modifications, leading to unreliable results.
arXiv Detail & Related papers (2025-04-09T11:46:41Z) - Diffusion-Based Imitation Learning for Social Pose Generation [0.0]
Intelligent agents, such as robots and virtual agents, must understand the dynamics of complex social interactions to interact with humans.
We explore how using a single modality, the pose behavior, of multiple individuals in a social interaction can be used to generate nonverbal social cues for the facilitator of that interaction.
arXiv Detail & Related papers (2025-01-18T20:31:55Z) - Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction [60.964512894143475]
We present Generative Spatial Transformer ( GST), a novel auto-regressive framework that jointly addresses spatial localization and view prediction.
Our model simultaneously estimates the camera pose from a single image and predicts the view from a new camera pose, effectively bridging the gap between spatial awareness and visual prediction.
arXiv Detail & Related papers (2024-10-24T17:58:05Z) - Enhancing Population-based Search with Active Inference [0.0]
This paper proposes the integration of Active Inference into population-based metaheuristics to enhance performance.
Experimental results indicate that Active Inference can yield some improved solutions with only a marginal increase in computational cost.
arXiv Detail & Related papers (2024-08-18T17:21:21Z) - What Matters When Repurposing Diffusion Models for General Dense Perception Tasks? [49.84679952948808]
Recent works show promising results by simply fine-tuning T2I diffusion models for dense perception tasks.
We conduct a thorough investigation into critical factors that affect transfer efficiency and performance when using diffusion priors.
Our work culminates in the development of GenPercept, an effective deterministic one-step fine-tuning paradigm tailed for dense visual perception tasks.
arXiv Detail & Related papers (2024-03-10T04:23:24Z) - Disentangled Neural Relational Inference for Interpretable Motion
Prediction [38.40799770648501]
We develop a variational auto-encoder framework that integrates graph-based representations and timesequence models.
Our model infers dynamic interaction graphs augmented with interpretable edge features that characterize the interactions.
We validate our approach through extensive experiments on both simulated and real-world datasets.
arXiv Detail & Related papers (2024-01-07T22:49:24Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - What Do Deep Saliency Models Learn about Visual Attention? [28.023464783469738]
We present a novel analytic framework that sheds light on the implicit features learned by saliency models.
Our approach decomposes these implicit features into interpretable bases that are explicitly aligned with semantic attributes.
arXiv Detail & Related papers (2023-10-14T23:15:57Z) - An Ensemble Approach for Facial Expression Analysis in Video [5.363490780925308]
This paper introduces the Affective Behavior Analysis in-the-wild (ABAW3) 2022 challenge.
The paper focuses on solving the problem of the.
valence-arousal estimation and action unit detection.
arXiv Detail & Related papers (2022-03-24T07:25:23Z) - Dyadic Human Motion Prediction [119.3376964777803]
We introduce a motion prediction framework that explicitly reasons about the interactions of two observed subjects.
Specifically, we achieve this by introducing a pairwise attention mechanism that models the mutual dependencies in the motion history of the two subjects.
This allows us to preserve the long-term motion dynamics in a more realistic way and more robustly predict unusual and fast-paced movements.
arXiv Detail & Related papers (2021-12-01T10:30:40Z) - Towards Robust and Adaptive Motion Forecasting: A Causal Representation
Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables.
We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph.
Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z) - Variational Structured Attention Networks for Deep Visual Representation
Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z) - EvolveGraph: Multi-Agent Trajectory Prediction with Dynamic Relational
Reasoning [41.42230144157259]
We propose a generic trajectory forecasting framework with explicit relational structure recognition and prediction via latent interaction graphs.
Considering the uncertainty of future behaviors, the model is designed to provide multi-modal prediction hypotheses.
We introduce a double-stage training pipeline which not only improves training efficiency and accelerates convergence, but also enhances model performance.
arXiv Detail & Related papers (2020-03-31T02:49:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.