Probing Human Visual Robustness with Neurally-Guided Deep Neural Networks
- URL: http://arxiv.org/abs/2405.02564v2
- Date: Sun, 18 May 2025 04:19:59 GMT
- Title: Probing Human Visual Robustness with Neurally-Guided Deep Neural Networks
- Authors: Zhenan Shao, Linjian Ma, Yiqing Zhou, Yibo Jacky Zhang, Sanmi Koyejo, Bo Li, Diane M. Beck,
- Abstract summary: Humans effortlessly navigate the dynamic visual world, yet deep neural networks (DNNs) are surprisingly vulnerable to minor image perturbations.<n>Past theories suggest that human visual robustness arises from a representational space that evolves along the ventral visual stream (VVS) of the brain to increasingly tolerate object transformations.<n>We demonstrate a hierarchical improvement in DNN robustness: alignment to higher-order VVS regions leads to greater improvement.
- Score: 18.994287352758697
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans effortlessly navigate the dynamic visual world, yet deep neural networks (DNNs), despite excelling at many visual tasks, are surprisingly vulnerable to minor image perturbations. Past theories suggest that human visual robustness arises from a representational space that evolves along the ventral visual stream (VVS) of the brain to increasingly tolerate object transformations. To test whether robustness is supported by such progression as opposed to being confined exclusively to specialized higher-order regions, we trained DNNs to align their representations with human neural responses from consecutive VVS regions while performing visual tasks. We demonstrate a hierarchical improvement in DNN robustness: alignment to higher-order VVS regions leads to greater improvement. To investigate the mechanism behind such robustness gains, we test a prominent hypothesis that attributes human robustness to the unique geometry of neural category manifolds in the VVS. We first reveal that more desirable manifold properties, specifically, smaller extent and better linear separability, indeed emerge across the human VVS. These properties can be inherited by neurally aligned DNNs and predict their subsequent robustness gains. Furthermore, we show that supervision from neural manifolds alone, via manifold guidance, is sufficient to qualitatively reproduce the hierarchical robustness improvements. Together, these results highlight the critical role of the evolving representational space across VVS in achieving robust visual inference, in part through the formation of more linearly separable category manifolds, which may in turn be leveraged to develop more robust AI systems.
Related papers
- Deep Models, Shallow Alignment: Uncovering the Granularity Mismatch in Neural Decoding [8.822848795081693]
We propose a novel contrastive learning strategy that aligns neural signals with intermediate representations of visual encoders rather than their final outputs.<n>Our approach effectively unlocks the scaling law in neural visual decoding, enabling decoding performance to scale predictably with the capacity of pre-trained vision backbones.
arXiv Detail & Related papers (2026-01-29T16:30:32Z) - Emulating Human-like Adaptive Vision for Efficient and Flexible Machine Visual Perception [93.20637973889434]
We introduce AdaptiveNN, a general framework aiming to drive a paradigm shift from 'passive' to 'active' vision models.<n> AdaptiveNN formulates visual perception as a coarse-to-fine sequential decision-making process.<n>We assess AdaptiveNN on 17 benchmarks spanning 9 tasks, including large-scale visual recognition, fine-grained discrimination, visual search, and processing images from real driving and medical scenarios.
arXiv Detail & Related papers (2025-09-18T18:25:43Z) - Sparse Autoencoder Neural Operators: Model Recovery in Function Spaces [75.45093712182624]
We introduce a framework that extends sparse autoencoders (SAEs) to lifted spaces and infinite-dimensional function spaces, enabling mechanistic interpretability of large neural operators (NO)<n>We compare the inference and training dynamics of SAEs, lifted-SAE, and SAE neural operators.<n>We highlight how lifting and operator modules introduce beneficial inductive biases, enabling faster recovery, improved recovery of smooth concepts, and robust inference across varying resolutions, a property unique to neural operators.
arXiv Detail & Related papers (2025-09-03T21:57:03Z) - Representation Understanding via Activation Maximization [13.88866465448849]
We propose a unified feature visualization framework applicable to both Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs)<n>Our experiments demonstrate the effectiveness of our approach in both traditional CNNs and modern ViT, highlighting its generalizability and value.
arXiv Detail & Related papers (2025-08-10T10:36:30Z) - TDSNNs: Competitive Topographic Deep Spiking Neural Networks for Visual Cortex Modeling [1.732019193517103]
We propose a novel Spatio-Temporal Constraints loss function for topographic deep spiking neural networks (SNNs)<n>Our results show that STC effectively generates representative topographic features across simulated visual cortical areas.<n>We also reveal that topographic organization facilitates efficient and stable temporal information processing via the spike mechanism in TDSNNs.
arXiv Detail & Related papers (2025-08-06T09:53:39Z) - Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis [14.275283048655268]
We compare neural networks evolved through an open-ended search process to networks trained via conventional gradient descent.<n>While both networks produce the same output behavior, their internal representations differ dramatically.<n>In large models, FER may be degrading core model capacities like generalization, creativity, and (continual) learning.
arXiv Detail & Related papers (2025-05-16T16:28:34Z) - Spiking Meets Attention: Efficient Remote Sensing Image Super-Resolution with Attention Spiking Neural Networks [57.17129753411926]
Spiking neural networks (SNNs) are emerging as a promising alternative to traditional artificial neural networks (ANNs)<n>We propose SpikeSR, which achieves state-of-the-art performance across various remote sensing benchmarks such as AID, DOTA, and DIOR.
arXiv Detail & Related papers (2025-03-06T09:06:06Z) - Aligning Machine and Human Visual Representations across Abstraction Levels [42.86478924838503]
Deep neural networks have achieved success across a wide range of applications, including as models of human behavior in vision tasks.
However, neural network training and human learning differ in fundamental ways, and neural networks often fail to generalize as robustly as humans do.
We highlight a key misalignment between vision models and humans: whereas human conceptual knowledge is hierarchically organized from fine- to coarse-scale distinctions, model representations do not accurately capture all these levels of abstraction.
To address this misalignment, we first train a teacher model to imitate human judgments, then transfer human-like structure from its representations into pretrained state-of-the
arXiv Detail & Related papers (2024-09-10T13:41:08Z) - Super Consistency of Neural Network Landscapes and Learning Rate Transfer [72.54450821671624]
We study the landscape through the lens of the loss Hessian.
We find that certain spectral properties under $mu$P are largely independent of the size of the network.
We show that in the Neural Tangent Kernel (NTK) and other scaling regimes, the sharpness exhibits very different dynamics at different scales.
arXiv Detail & Related papers (2024-02-27T12:28:01Z) - Achieving More Human Brain-Like Vision via Human EEG Representational Alignment [1.811217832697894]
We present 'Re(presentational)Al(ignment)net', a vision model aligned with human brain activity based on non-invasive EEG.
Our innovative image-to-brain multi-layer encoding framework advances human neural alignment by optimizing multiple model layers.
Our findings suggest that ReAlnet represents a breakthrough in bridging the gap between artificial and human vision, and paving the way for more brain-like artificial intelligence systems.
arXiv Detail & Related papers (2024-01-30T18:18:41Z) - Unveiling the Unseen: Identifiable Clusters in Trained Depthwise
Convolutional Kernels [56.69755544814834]
Recent advances in depthwise-separable convolutional neural networks (DS-CNNs) have led to novel architectures.
This paper reveals another striking property of DS-CNN architectures: discernible and explainable patterns emerge in their trained depthwise convolutional kernels in all layers.
arXiv Detail & Related papers (2024-01-25T19:05:53Z) - A Survey on Transferability of Adversarial Examples across Deep Neural Networks [53.04734042366312]
adversarial examples can manipulate machine learning models into making erroneous predictions.
The transferability of adversarial examples enables black-box attacks which circumvent the need for detailed knowledge of the target model.
This survey explores the landscape of the adversarial transferability of adversarial examples.
arXiv Detail & Related papers (2023-10-26T17:45:26Z) - End-to-end topographic networks as models of cortical map formation and
human visual behaviour: moving beyond convolutions [0.29687381456164]
We develop All-Topographic Neural Networks (All-TNNs) to model the organisation of the primate visual system.
We show that All-TNNs significantly better align with human behaviour than previous state-of-the-art convolutional models due to their topographic nature.
All-TNNs thereby mark an important step forward in understanding the spatial organisation of the visual brain and how it mediates visual behaviour.
arXiv Detail & Related papers (2023-08-18T10:03:51Z) - Training on Foveated Images Improves Robustness to Adversarial Attacks [26.472800216546233]
Deep neural networks (DNNs) have been shown to be vulnerable to adversarial attacks.
RBlur is an image transform that simulates the loss in fidelity of peripheral vision by blurring the image and reducing its color saturation.
DNNs trained on images transformed by RBlur are substantially more robust to adversarial attacks, as well as other, non-adversarial, corruptions, achieving up to 25% higher accuracy on perturbed data.
arXiv Detail & Related papers (2023-08-01T21:40:30Z) - Transferability of coVariance Neural Networks and Application to
Interpretable Brain Age Prediction using Anatomical Features [119.45320143101381]
Graph convolutional networks (GCN) leverage topology-driven graph convolutional operations to combine information across the graph for inference tasks.
We have studied GCNs with covariance matrices as graphs in the form of coVariance neural networks (VNNs)
VNNs inherit the scale-free data processing architecture from GCNs and here, we show that VNNs exhibit transferability of performance over datasets whose covariance matrices converge to a limit object.
arXiv Detail & Related papers (2023-05-02T22:15:54Z) - Training Robust Spiking Neural Networks with ViewPoint Transform and
SpatioTemporal Stretching [4.736525128377909]
We propose a novel data augmentation method, ViewPoint Transform and Spatio Stretching (VPT-STS)
It improves the robustness of spiking neural networks by transforming the rotation centers and angles in thetemporal domain to generate samples from different viewpoints.
Experiments on prevailing neuromorphic datasets demonstrate that VPT-STS is broadly effective on multi-event representations and significantly outperforms pure spatial geometric transformations.
arXiv Detail & Related papers (2023-03-14T03:09:56Z) - Guiding Visual Attention in Deep Convolutional Neural Networks Based on
Human Eye Movements [0.0]
Deep Convolutional Neural Networks (DCNNs) were originally inspired by principles of biological vision.
Recent advances in deep learning seem to decrease this similarity.
We investigate a purely data-driven approach to obtain useful models.
arXiv Detail & Related papers (2022-06-21T17:59:23Z) - Adversarially trained neural representations may already be as robust as
corresponding biological neural representations [66.73634912993006]
We develop a method for performing adversarial visual attacks directly on primate brain activity.
We report that the biological neurons that make up visual systems of primates exhibit susceptibility to adversarial perturbations that is comparable in magnitude to existing (robustly trained) artificial neural networks.
arXiv Detail & Related papers (2022-06-19T04:15:29Z) - Searching for the Essence of Adversarial Perturbations [73.96215665913797]
We show that adversarial perturbations contain human-recognizable information, which is the key conspirator responsible for a neural network's erroneous prediction.
This concept of human-recognizable information allows us to explain key features related to adversarial perturbations.
arXiv Detail & Related papers (2022-05-30T18:04:57Z) - Behind the Machine's Gaze: Biologically Constrained Neural Networks
Exhibit Human-like Visual Attention [40.878963450471026]
We propose the Neural Visual Attention (NeVA) algorithm to generate visual scanpaths in a top-down manner.
We show that the proposed method outperforms state-of-the-art unsupervised human attention models in terms of similarity to human scanpaths.
arXiv Detail & Related papers (2022-04-19T18:57:47Z) - Improving Neural Predictivity in the Visual Cortex with Gated Recurrent
Connections [0.0]
We aim to shift the focus on architectures that take into account lateral recurrent connections, a ubiquitous feature of the ventral visual stream, to devise adaptive receptive fields.
In order to increase the robustness of our approach and the biological fidelity of the activations, we employ specific data augmentation techniques.
arXiv Detail & Related papers (2022-03-22T17:27:22Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Overcoming the Domain Gap in Neural Action Representations [60.47807856873544]
3D pose data can now be reliably extracted from multi-view video sequences without manual intervention.
We propose to use it to guide the encoding of neural action representations together with a set of neural and behavioral augmentations.
To reduce the domain gap, during training, we swap neural and behavioral data across animals that seem to be performing similar actions.
arXiv Detail & Related papers (2021-12-02T12:45:46Z) - MetaAvatar: Learning Animatable Clothed Human Models from Few Depth
Images [60.56518548286836]
To generate realistic cloth deformations from novel input poses, watertight meshes or dense full-body scans are usually needed as inputs.
We propose an approach that can quickly generate realistic clothed human avatars, represented as controllable neural SDFs, given only monocular depth images.
arXiv Detail & Related papers (2021-06-22T17:30:12Z) - Fooling the primate brain with minimal, targeted image manipulation [67.78919304747498]
We propose an array of methods for creating minimal, targeted image perturbations that lead to changes in both neuronal activity and perception as reflected in behavior.
Our work shares the same goal with adversarial attack, namely the manipulation of images with minimal, targeted noise that leads ANN models to misclassify the images.
arXiv Detail & Related papers (2020-11-11T08:30:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.