Understanding top-down attention using task-oriented ablation design
- URL: http://arxiv.org/abs/2106.11339v1
- Date: Tue, 8 Jun 2021 21:01:47 GMT
- Title: Understanding top-down attention using task-oriented ablation design
- Authors: Freddie Bickford Smith, Brett D Roads, Xiaoliang Luo, Bradley C Love
- Abstract summary: Top-down attention allows neural networks, both artificial and biological, to focus on the information most relevant for a given task.
We aim to answer this with a computational experiment based on a general framework called task-oriented ablation design.
We compare the performance of two neural networks, one with top-down attention and one without.
- Score: 0.22940141855172028
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Top-down attention allows neural networks, both artificial and biological, to
focus on the information most relevant for a given task. This is known to
enhance performance in visual perception. But it remains unclear how attention
brings about its perceptual boost, especially when it comes to naturalistic
settings like recognising an object in an everyday scene. What aspects of a
visual task does attention help to deal with? We aim to answer this with a
computational experiment based on a general framework called task-oriented
ablation design. First we define a broad range of visual tasks and identify six
factors that underlie task variability. Then on each task we compare the
performance of two neural networks, one with top-down attention and one
without. These comparisons reveal the task-dependence of attention's perceptual
boost, giving a clearer idea of the role attention plays. Whereas many existing
cognitive accounts link attention to stimulus-level variables, such as visual
clutter and object scale, we find greater explanatory power in system-level
variables that capture the interaction between the model, the distribution of
training data and the task format. This finding suggests a shift in how
attention is studied could be fruitful. We make publicly available our code and
results, along with statistics relevant to ImageNet-based experiments beyond
this one. Our contribution serves to support the development of more human-like
vision models and the design of more informative machine-learning experiments.
Related papers
- Shifting Focus with HCEye: Exploring the Dynamics of Visual Highlighting and Cognitive Load on User Attention and Saliency Prediction [3.2873782624127834]
This paper examines the joint impact of visual highlighting (permanent and dynamic) and dual-task-induced cognitive load on gaze behaviour.
We show that state-of-the-art saliency models increase their performance when accounting for different cognitive loads.
arXiv Detail & Related papers (2024-04-22T14:45:30Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - PhD Thesis: Exploring the role of (self-)attention in cognitive and
computer vision architecture [0.0]
We analyze Transformer-based self-attention as a model and extend it with memory.
We propose GAMR, a cognitive architecture combining attention and memory, inspired by active vision theory.
arXiv Detail & Related papers (2023-06-26T12:40:12Z) - BI AVAN: Brain inspired Adversarial Visual Attention Network [67.05560966998559]
We propose a brain-inspired adversarial visual attention network (BI-AVAN) to characterize human visual attention directly from functional brain activity.
Our model imitates the biased competition process between attention-related/neglected objects to identify and locate the visual objects in a movie frame the human brain focuses on in an unsupervised manner.
arXiv Detail & Related papers (2022-10-27T22:20:36Z) - Task Formulation Matters When Learning Continually: A Case Study in
Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge.
We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z) - Attention in Reasoning: Dataset, Analysis, and Modeling [31.3104693230952]
We propose an Attention with Reasoning capability (AiR) framework that uses attention to understand and improve the process leading to task outcomes.
We first define an evaluation metric based on a sequence of atomic reasoning operations, enabling a quantitative measurement of attention.
We then collect human eye-tracking and answer correctness data, and analyze various machine and human attention mechanisms on their reasoning capability.
arXiv Detail & Related papers (2022-04-20T20:32:31Z) - Understanding the computational demands underlying visual reasoning [10.308647202215708]
We systematically assess the ability of modern deep convolutional neural networks to learn to solve visual reasoning problems.
Our analysis leads to a novel taxonomy of visual reasoning tasks, which can be primarily explained by the type of relations and the number of relations used to compose the underlying rules.
arXiv Detail & Related papers (2021-08-08T10:46:53Z) - Affect Analysis in-the-wild: Valence-Arousal, Expressions, Action Units
and a Unified Framework [83.21732533130846]
The paper focuses on large in-the-wild databases, i.e., Aff-Wild and Aff-Wild2.
It presents the design of two classes of deep neural networks trained with these databases.
A novel multi-task and holistic framework is presented which is able to jointly learn and effectively generalize and perform affect recognition.
arXiv Detail & Related papers (2021-03-29T17:36:20Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - Gravitational Models Explain Shifts on Human Visual Attention [80.76475913429357]
Visual attention refers to the human brain's ability to select relevant sensory information for preferential processing.
Various methods to estimate saliency have been proposed in the last three decades.
We propose a gravitational model (GRAV) to describe the attentional shifts.
arXiv Detail & Related papers (2020-09-15T10:12:41Z) - The perceptual boost of visual attention is task-dependent in
naturalistic settings [5.735035463793008]
We design a collection of visual tasks, each consisting of classifying images from a chosen task set.
The nature of a task is determined by which categories are included in the task set.
On each task we train an attention-augmented neural network and then compare its accuracy to that of a baseline network.
We show that the perceptual boost of attention is stronger with increasing task-set difficulty, weaker with increasing task-set size and weaker with increasing perceptual similarity within a task set.
arXiv Detail & Related papers (2020-02-22T09:10:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.