Transformer brain encoders explain human high-level visual responses
- URL: http://arxiv.org/abs/2505.17329v1
- Date: Thu, 22 May 2025 22:48:15 GMT
- Title: Transformer brain encoders explain human high-level visual responses
- Authors: Hossein Adeli, Minni Sun, Nikolaus Kriegeskorte,
- Abstract summary: We study how retinotopic visual features can be dynamically routed to category-selective areas in high-level visual processing.<n>We show that this computational motif is significantly more powerful than alternative methods in predicting brain activity during natural scene viewing.<n>Our approach proposes a mechanistic model of how visual information from retinotopic maps can be routed based on the relevance of the input content to different category-selective regions.
- Score: 0.5917100081691198
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A major goal of neuroscience is to understand brain computations during visual processing in naturalistic settings. A dominant approach is to use image-computable deep neural networks trained with different task objectives as a basis for linear encoding models. However, in addition to requiring tuning a large number of parameters, the linear encoding approach ignores the structure of the feature maps both in the brain and the models. Recently proposed alternatives have focused on decomposing the linear mapping to spatial and feature components but focus on finding static receptive fields for units that are applicable only in early visual areas. In this work, we employ the attention mechanism used in the transformer architecture to study how retinotopic visual features can be dynamically routed to category-selective areas in high-level visual processing. We show that this computational motif is significantly more powerful than alternative methods in predicting brain activity during natural scene viewing, across different feature basis models and modalities. We also show that this approach is inherently more interpretable, without the need to create importance maps, by interpreting the attention routing signal for different high-level categorical areas. Our approach proposes a mechanistic model of how visual information from retinotopic maps can be routed based on the relevance of the input content to different category-selective regions.
Related papers
- Visualizing and Controlling Cortical Responses Using Voxel-Weighted Activation Maximization [0.0]
Deep neural networks (DNNs) are trained on visual representations that resemble those in the human visual system.<n>We show that activation can be applied to DNN-based encoding models.<n>We generate images optimized for predicted responses in individual voxels.
arXiv Detail & Related papers (2025-06-04T18:48:08Z) - Convolution goes higher-order: a biologically inspired mechanism empowers image classification [0.8999666725996975]
We propose a novel approach to image classification inspired by complex nonlinear biological visual processing.<n>Our model incorporates a Volterra-like expansion of the convolution operator, capturing multiplicative interactions.<n>Our work bridges neuroscience and deep learning, offering a path towards more effective, biologically inspired computer vision models.
arXiv Detail & Related papers (2024-12-09T18:33:09Z) - Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models [10.615012396285337]
We develop algorithms to enhance our understanding of visual processes by incorporating whole-brain activation maps.
We first compare our method with state-of-the-art approaches to decoding visual processing and show improved predictive semantic accuracy by 43%.
arXiv Detail & Related papers (2024-11-11T16:51:17Z) - Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers [5.265058307999745]
We introduce BrainSAIL, a method for isolating neurally-activating visual concepts in images.
BrainSAIL exploits semantically consistent, dense spatial features from pre-trained vision models.
We validate BrainSAIL on cortical regions with known category selectivity.
arXiv Detail & Related papers (2024-10-07T17:59:45Z) - Foveated Retinotopy Improves Classification and Localization in CNNs [0.0]
We show how incorporating foveated retinotopy may benefit deep convolutional neural networks (CNNs) in image classification tasks.<n>Our findings suggest that foveated retinotopic mapping encodes implicit knowledge about visual object geometry.
arXiv Detail & Related papers (2024-02-23T18:15:37Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Peripheral Vision Transformer [52.55309200601883]
We take a biologically inspired approach and explore to model peripheral vision in deep neural networks for visual recognition.
We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data.
We evaluate the proposed network, dubbed PerViT, on the large-scale ImageNet dataset and systematically investigate the inner workings of the model for machine perception.
arXiv Detail & Related papers (2022-06-14T12:47:47Z) - Brain Cortical Functional Gradients Predict Cortical Folding Patterns
via Attention Mesh Convolution [51.333918985340425]
We develop a novel attention mesh convolution model to predict cortical gyro-sulcal segmentation maps on individual brains.
Experiments show that the prediction performance via our model outperforms other state-of-the-art models.
arXiv Detail & Related papers (2022-05-21T14:08:53Z) - An explainability framework for cortical surface-based deep learning [110.83289076967895]
We develop a framework for cortical surface-based deep learning.
First, we adapted a perturbation-based approach for use with surface data.
We show that our explainability framework is not only able to identify important features and their spatial location but that it is also reliable and valid.
arXiv Detail & Related papers (2022-03-15T23:16:49Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Self-Supervised Graph Representation Learning for Neuronal Morphologies [75.38832711445421]
We present GraphDINO, a data-driven approach to learn low-dimensional representations of 3D neuronal morphologies from unlabeled datasets.
We show, in two different species and across multiple brain areas, that this method yields morphological cell type clusterings on par with manual feature-based classification by experts.
Our method could potentially enable data-driven discovery of novel morphological features and cell types in large-scale datasets.
arXiv Detail & Related papers (2021-12-23T12:17:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.