Saliency Suppressed, Semantics Surfaced: Visual Transformations in Neural Networks and the Brain
- URL: http://arxiv.org/abs/2404.18772v1
- Date: Mon, 29 Apr 2024 15:05:42 GMT
- Title: Saliency Suppressed, Semantics Surfaced: Visual Transformations in Neural Networks and the Brain
- Authors: Gustaw Opiełka, Jessica Loke, Steven Scholte,
- Abstract summary: We take inspiration from neuroscience to shed light on how neural networks encode information at low (visual saliency) and high (semantic similarity) levels of abstraction.
We find that ResNets are more sensitive to saliency information than ViTs, when trained with object classification objectives.
We show that semantic encoding is a key factor in aligning AI with human visual perception, while saliency suppression is a non-brain-like strategy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning algorithms lack human-interpretable accounts of how they transform raw visual input into a robust semantic understanding, which impedes comparisons between different architectures, training objectives, and the human brain. In this work, we take inspiration from neuroscience and employ representational approaches to shed light on how neural networks encode information at low (visual saliency) and high (semantic similarity) levels of abstraction. Moreover, we introduce a custom image dataset where we systematically manipulate salient and semantic information. We find that ResNets are more sensitive to saliency information than ViTs, when trained with object classification objectives. We uncover that networks suppress saliency in early layers, a process enhanced by natural language supervision (CLIP) in ResNets. CLIP also enhances semantic encoding in both architectures. Finally, we show that semantic encoding is a key factor in aligning AI with human visual perception, while saliency suppression is a non-brain-like strategy.
Related papers
- Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models [10.615012396285337]
We develop algorithms to enhance our understanding of visual processes by incorporating whole-brain activation maps.
We first compare our method with state-of-the-art approaches to decoding visual processing and show improved predictive semantic accuracy by 43%.
arXiv Detail & Related papers (2024-11-11T16:51:17Z) - Learning Object-Centric Representation via Reverse Hierarchy Guidance [73.05170419085796]
Object-Centric Learning (OCL) seeks to enable Neural Networks to identify individual objects in visual scenes.
RHGNet introduces a top-down pathway that works in different ways in the training and inference processes.
Our model achieves SOTA performance on several commonly used datasets.
arXiv Detail & Related papers (2024-05-17T07:48:27Z) - Visualizing Neural Network Imagination [2.1749194587826026]
In certain situations, neural networks will represent environment states in their hidden activations.
Our goal is to visualize what environment states the networks are representing.
We define a quantitative interpretability metric and use it to demonstrate that hidden states can be highly interpretable.
arXiv Detail & Related papers (2024-05-10T11:43:35Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Brain Decodes Deep Nets [9.302098067235507]
We developed a tool for visualizing and analyzing large pre-trained vision models by mapping them onto the brain.
Our innovation arises from a surprising usage of brain encoding: predicting brain fMRI measurements in response to images.
arXiv Detail & Related papers (2023-12-03T04:36:04Z) - Seeing in Words: Learning to Classify through Language Bottlenecks [59.97827889540685]
Humans can explain their predictions using succinct and intuitive descriptions.
We show that a vision model whose feature representations are text can effectively classify ImageNet images.
arXiv Detail & Related papers (2023-06-29T00:24:42Z) - Connecting metrics for shape-texture knowledge in computer vision [1.7785095623975342]
Deep neural networks remain brittle and susceptible to many changes in the image that do not cause humans to misclassify images.
Part of this different behavior may be explained by the type of features humans and deep neural networks use in vision tasks.
arXiv Detail & Related papers (2023-01-25T14:37:42Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Controlled Caption Generation for Images Through Adversarial Attacks [85.66266989600572]
We study adversarial examples for vision and language models, which typically adopt a Convolutional Neural Network (i.e., CNN) for image feature extraction and a Recurrent Neural Network (RNN) for caption generation.
In particular, we investigate attacks on the visual encoder's hidden layer that is fed to the subsequent recurrent network.
We propose a GAN-based algorithm for crafting adversarial examples for neural image captioning that mimics the internal representation of the CNN.
arXiv Detail & Related papers (2021-07-07T07:22:41Z) - Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks.
First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts.
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z) - Deep learning approaches for neural decoding: from CNNs to LSTMs and
spikes to fMRI [2.0178765779788495]
Decoding behavior, perception, or cognitive state directly from neural signals has applications in brain-computer interface research.
In the last decade, deep learning has become the state-of-the-art method in many machine learning tasks.
Deep learning has been shown to be a useful tool for improving the accuracy and flexibility of neural decoding across a wide range of tasks.
arXiv Detail & Related papers (2020-05-19T18:10:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.