An Active Inference Model of Covert and Overt Visual Attention
- URL: http://arxiv.org/abs/2505.03856v1
- Date: Tue, 06 May 2025 09:26:00 GMT
- Title: An Active Inference Model of Covert and Overt Visual Attention
- Authors: Tin Mišić, Karlo Koledić, Fabio Bonsignorio, Ivan Petrović, Ivan Marković,
- Abstract summary: This paper introduces a model of covert and overt visual attention through the framework of active inference.<n>The model determines visual sensory precisions based on both current environmental beliefs and sensory input.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to selectively attend to relevant stimuli while filtering out distractions is essential for agents that process complex, high-dimensional sensory input. This paper introduces a model of covert and overt visual attention through the framework of active inference, utilizing dynamic optimization of sensory precisions to minimize free-energy. The model determines visual sensory precisions based on both current environmental beliefs and sensory input, influencing attentional allocation in both covert and overt modalities. To test the effectiveness of the model, we analyze its behavior in the Posner cueing task and a simple target focus task using two-dimensional(2D) visual data. Reaction times are measured to investigate the interplay between exogenous and endogenous attention, as well as valid and invalid cueing. The results show that exogenous and valid cues generally lead to faster reaction times compared to endogenous and invalid cues. Furthermore, the model exhibits behavior similar to inhibition of return, where previously attended locations become suppressed after a specific cue-target onset asynchrony interval. Lastly, we investigate different aspects of overt attention and show that involuntary, reflexive saccades occur faster than intentional ones, but at the expense of adaptability.
Related papers
- End-to-End Facial Expression Detection in Long Videos [0.2796197251957245]
We propose an end-to-end Facial Expression Detection Network (FEDN) to jointly optimize spotting and recognition.<n>By unifying two tasks within a single network, we greatly reduce error propagation and enhance overall performance.
arXiv Detail & Related papers (2025-04-10T11:18:46Z) - Modeling Attention during Dimensional Shifts with Counterfactual and Delayed Feedback [0.4915744683251151]
We compare two methods for modeling how humans attend to specific features of decision making tasks.<n>We find that calculating an information theoretic metric over a history of experiences is best able to account for human-like behavior.
arXiv Detail & Related papers (2025-01-19T20:26:34Z) - Uncertainty-Guided Appearance-Motion Association Network for Out-of-Distribution Action Detection [4.938957922033169]
Out-of-distribution (OOD) detection targets to detect and reject test samples with semantic shifts.<n>We propose a novel Uncertainty-Guided Appearance-Motion Association Network (UAAN)<n>We show that UAAN beats state-of-the-art methods by a significant margin, illustrating its effectiveness.
arXiv Detail & Related papers (2024-09-16T02:53:49Z) - What Matters When Repurposing Diffusion Models for General Dense Perception Tasks? [49.84679952948808]
Recent works show promising results by simply fine-tuning T2I diffusion models for dense perception tasks.<n>We conduct a thorough investigation into critical factors that affect transfer efficiency and performance when using diffusion priors.<n>Our work culminates in the development of GenPercept, an effective deterministic one-step fine-tuning paradigm tailed for dense visual perception tasks.
arXiv Detail & Related papers (2024-03-10T04:23:24Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - GASP: Gated Attention For Saliency Prediction [18.963277212703005]
We present a neural model for integrating social cues and weighting their influences.
We show that gaze direction and affective representations contribute a prediction to ground-truth improvement of at least 5% compared to dynamic saliency models.
arXiv Detail & Related papers (2022-06-09T16:14:09Z) - Hybrid Predictive Coding: Inferring, Fast and Slow [62.997667081978825]
We propose a hybrid predictive coding network that combines both iterative and amortized inference in a principled manner.
We demonstrate that our model is inherently sensitive to its uncertainty and adaptively balances balances to obtain accurate beliefs using minimum computational expense.
arXiv Detail & Related papers (2022-04-05T12:52:45Z) - Learning Self-Modulating Attention in Continuous Time Space with
Applications to Sequential Recommendation [102.24108167002252]
We propose a novel attention network, named self-modulating attention, that models the complex and non-linearly evolving dynamic user preferences.
We empirically demonstrate the effectiveness of our method on top-N sequential recommendation tasks, and the results on three large-scale real-world datasets show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-03-30T03:54:11Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - Wave Propagation of Visual Stimuli in Focus of Attention [77.4747032928547]
Fast reactions to changes in the surrounding visual environment require efficient attention mechanisms to reallocate computational resources to most relevant locations in the visual field.
We present a biologically-plausible model of focus of attention that exhibits effectiveness and efficiency exhibited by foveated animals.
arXiv Detail & Related papers (2020-06-19T09:33:21Z) - Human Activity Recognition from Wearable Sensor Data Using
Self-Attention [2.9023633922848586]
We present a self-attention based neural network model for activity recognition from body-worn sensor data.
We performed experiments on four popular publicly available HAR datasets: PAMAP2, Opportunity, Skoda and USC-HAD.
Our model achieve significant performance improvement over recent state-of-the-art models in both benchmark test subjects and Leave-one-out-subject evaluation.
arXiv Detail & Related papers (2020-03-17T14:16:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.