Related papers: GASP: Gated Attention For Saliency Prediction

GASP: Gated Attention For Saliency Prediction

URL: http://arxiv.org/abs/2206.04590v1
Date: Thu, 9 Jun 2022 16:14:09 GMT
Title: GASP: Gated Attention For Saliency Prediction
Authors: Fares Abawi, Tom Weber and Stefan Wermter
Abstract summary: We present a neural model for integrating social cues and weighting their influences. We show that gaze direction and affective representations contribute a prediction to ground-truth improvement of at least 5% compared to dynamic saliency models.
Score: 18.963277212703005
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Saliency prediction refers to the computational task of modeling overt attention. Social cues greatly influence our attention, consequently altering our eye movements and behavior. To emphasize the efficacy of such features, we present a neural model for integrating social cues and weighting their influences. Our model consists of two stages. During the first stage, we detect two social cues by following gaze, estimating gaze direction, and recognizing affect. These features are then transformed into spatiotemporal maps through image processing operations. The transformed representations are propagated to the second stage (GASP) where we explore various techniques of late fusion for integrating social cues and introduce two sub-networks for directing attention to relevant stimuli. Our experiments indicate that fusion approaches achieve better results for static integration methods, whereas non-fusion approaches for which the influence of each modality is unknown, result in better outcomes when coupled with recurrent models for dynamic saliency prediction. We show that gaze direction and affective representations contribute a prediction to ground-truth correspondence improvement of at least 5% compared to dynamic saliency models without social cues. Furthermore, affective representations improve GASP, supporting the necessity of considering affect-biased attention in predicting saliency.

Related papers

Human Scanpath Prediction in Target-Present Visual Search with Semantic-Foveal Bayesian Attention [49.99728312519117]
SemBA-FAST is a top-down framework designed for predicting human visual attention in target-present visual search.<n>We evaluate SemBA-FAST on the COCO-Search18 benchmark dataset, comparing its performance against other scanpath prediction models.<n>These findings provide valuable insights into the capabilities of semantic-foveal probabilistic frameworks for human-like attention modelling.
arXiv Detail & Related papers (2025-07-24T15:19:23Z)
Dynamic Programming Techniques for Enhancing Cognitive Representation in Knowledge Tracing [125.75923987618977]
We propose the Cognitive Representation Dynamic Programming based Knowledge Tracing (CRDP-KT) model.<n>It is a dynamic programming algorithm to optimize cognitive representations based on the difficulty of the questions and the performance intervals between them.<n>It provides more accurate and systematic input features for subsequent model training, thereby minimizing distortion in the simulation of cognitive states.
arXiv Detail & Related papers (2025-06-03T14:44:48Z)
An Active Inference Model of Covert and Overt Visual Attention [0.0]
This paper introduces a model of covert and overt visual attention through the framework of active inference.<n>The model determines visual sensory precisions based on both current environmental beliefs and sensory input.
arXiv Detail & Related papers (2025-05-06T09:26:00Z)
A Meaningful Perturbation Metric for Evaluating Explainability Methods [55.09730499143998]
We introduce a novel approach, which harnesses image generation models to perform targeted perturbation. Specifically, we focus on inpainting only the high-relevance pixels of an input image to modify the model's predictions while preserving image fidelity. This is in contrast to existing approaches, which often produce out-of-distribution modifications, leading to unreliable results.
arXiv Detail & Related papers (2025-04-09T11:46:41Z)
Diffusion-Based Imitation Learning for Social Pose Generation [0.0]
Intelligent agents, such as robots and virtual agents, must understand the dynamics of complex social interactions to interact with humans. We explore how using a single modality, the pose behavior, of multiple individuals in a social interaction can be used to generate nonverbal social cues for the facilitator of that interaction.
arXiv Detail & Related papers (2025-01-18T20:31:55Z)
Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction [60.964512894143475]
We present Generative Spatial Transformer ( GST), a novel auto-regressive framework that jointly addresses spatial localization and view prediction. Our model simultaneously estimates the camera pose from a single image and predicts the view from a new camera pose, effectively bridging the gap between spatial awareness and visual prediction.
arXiv Detail & Related papers (2024-10-24T17:58:05Z)
Enhancing Population-based Search with Active Inference [0.0]
This paper proposes the integration of Active Inference into population-based metaheuristics to enhance performance. Experimental results indicate that Active Inference can yield some improved solutions with only a marginal increase in computational cost.
arXiv Detail & Related papers (2024-08-18T17:21:21Z)
What Matters When Repurposing Diffusion Models for General Dense Perception Tasks? [49.84679952948808]
Recent works show promising results by simply fine-tuning T2I diffusion models for dense perception tasks. We conduct a thorough investigation into critical factors that affect transfer efficiency and performance when using diffusion priors. Our work culminates in the development of GenPercept, an effective deterministic one-step fine-tuning paradigm tailed for dense visual perception tasks.
arXiv Detail & Related papers (2024-03-10T04:23:24Z)
Disentangled Neural Relational Inference for Interpretable Motion Prediction [38.40799770648501]
We develop a variational auto-encoder framework that integrates graph-based representations and timesequence models. Our model infers dynamic interaction graphs augmented with interpretable edge features that characterize the interactions. We validate our approach through extensive experiments on both simulated and real-world datasets.
arXiv Detail & Related papers (2024-01-07T22:49:24Z)
Disentangled Interaction Representation for One-Stage Human-Object Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding. Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction. Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z)
What Do Deep Saliency Models Learn about Visual Attention? [28.023464783469738]
We present a novel analytic framework that sheds light on the implicit features learned by saliency models. Our approach decomposes these implicit features into interpretable bases that are explicitly aligned with semantic attributes.
arXiv Detail & Related papers (2023-10-14T23:15:57Z)
An Ensemble Approach for Facial Expression Analysis in Video [5.363490780925308]
This paper introduces the Affective Behavior Analysis in-the-wild (ABAW3) 2022 challenge. The paper focuses on solving the problem of the. valence-arousal estimation and action unit detection.
arXiv Detail & Related papers (2022-03-24T07:25:23Z)
Dyadic Human Motion Prediction [119.3376964777803]
We introduce a motion prediction framework that explicitly reasons about the interactions of two observed subjects. Specifically, we achieve this by introducing a pairwise attention mechanism that models the mutual dependencies in the motion history of the two subjects. This allows us to preserve the long-term motion dynamics in a more realistic way and more robustly predict unusual and fast-paced movements.
arXiv Detail & Related papers (2021-12-01T10:30:40Z)
Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables. We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph. Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z)
Variational Structured Attention Networks for Deep Visual Representation Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner. Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework. We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z)
EvolveGraph: Multi-Agent Trajectory Prediction with Dynamic Relational Reasoning [41.42230144157259]
We propose a generic trajectory forecasting framework with explicit relational structure recognition and prediction via latent interaction graphs. Considering the uncertainty of future behaviors, the model is designed to provide multi-modal prediction hypotheses. We introduce a double-stage training pipeline which not only improves training efficiency and accelerates convergence, but also enhances model performance.
arXiv Detail & Related papers (2020-03-31T02:49:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.