Human Eyes Inspired Recurrent Neural Networks are More Robust Against
Adversarial Noises
- URL: http://arxiv.org/abs/2206.07282v1
- Date: Wed, 15 Jun 2022 03:44:42 GMT
- Title: Human Eyes Inspired Recurrent Neural Networks are More Robust Against
Adversarial Noises
- Authors: Minkyu Choi, Yizhen Zhang, Kuan Han, Xiaokai Wang, and Zhongming Liu
- Abstract summary: Compared to human vision, computer vision based on convolutional neural networks (CNN) are more vulnerable to adversarial noises.
This difference is likely attributable to how the eyes sample visual input and how the brain processes retinal samples through its dorsal and ventral visual pathways.
We design recurrent neural networks, including an input sampler that mimics the human retina, a dorsal network that guides where to look next, and a ventral network that represents the retinal samples.
Taking these modules together, the models learn to take multiple glances at an image, attend to a salient part at each glance, and accumulate the representation over time to recognize the image.
- Score: 3.8738982761490988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Compared to human vision, computer vision based on convolutional neural
networks (CNN) are more vulnerable to adversarial noises. This difference is
likely attributable to how the eyes sample visual input and how the brain
processes retinal samples through its dorsal and ventral visual pathways, which
are under-explored for computer vision. Inspired by the brain, we design
recurrent neural networks, including an input sampler that mimics the human
retina, a dorsal network that guides where to look next, and a ventral network
that represents the retinal samples. Taking these modules together, the models
learn to take multiple glances at an image, attend to a salient part at each
glance, and accumulate the representation over time to recognize the image. We
test such models for their robustness against a varying level of adversarial
noises with a special focus on the effect of different input sampling
strategies. Our findings suggest that retinal foveation and sampling renders a
model more robust against adversarial noises, and the model may correct itself
from an attack when it is given a longer time to take more glances at an image.
In conclusion, robust visual recognition can benefit from the combined use of
three brain-inspired mechanisms: retinal transformation, attention guided eye
movement, and recurrent processing, as opposed to feedforward-only CNNs.
Related papers
- A Neural Network Model of Spatial and Feature-Based Attention [0.0]
We designed a neural network model inspired by aspects of human visual attention.<n>The model's emergent attention patterns corresponded to spatial and feature-based attention.<n>This similarity between human visual attention and attention in computer vision suggests a promising direction for studying human cognition using neural network models.
arXiv Detail & Related papers (2025-06-05T18:08:11Z) - Align and Surpass Human Camouflaged Perception: Visual Refocus Reinforcement Fine-Tuning [18.13538667261998]
Current multi-modal models exhibit a notable misalignment with the human visual system when identifying objects that are visually assimilated into the background.<n>We build a visual system that mimicks human visual camouflaged perception to progressively and iteratively refocus' visual concealed content.
arXiv Detail & Related papers (2025-05-26T07:27:18Z) - When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - Dual Thinking and Perceptual Analysis of Deep Learning Models using Human Adversarial Examples [5.022336433202968]
The perception of dual thinking in vision requires images where inferences from intuitive and logical processing differ.
We introduce an adversarial dataset to provide evidence for the dual thinking framework in human vision.
Our study also addresses a major criticism of using classification models as computational models of human vision.
arXiv Detail & Related papers (2024-06-11T05:50:34Z) - Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - A Dual-Stream Neural Network Explains the Functional Segregation of
Dorsal and Ventral Visual Pathways in Human Brains [8.24969449883056]
We develop a dual-stream vision model inspired by the human eyes and brain.
At the input level, the model samples two complementary visual patterns.
At the backend, the model processes the separate input patterns through two branches of convolutional neural networks.
arXiv Detail & Related papers (2023-10-20T22:47:40Z) - Simulating Human Gaze with Neural Visual Attention [44.65733084492857]
We propose the Neural Visual Attention (NeVA) algorithm to integrate guidance of any downstream visual task into attention modeling.
We observe that biologically constrained neural networks generate human-like scanpaths without being trained for this objective.
arXiv Detail & Related papers (2022-11-22T09:02:09Z) - BI AVAN: Brain inspired Adversarial Visual Attention Network [67.05560966998559]
We propose a brain-inspired adversarial visual attention network (BI-AVAN) to characterize human visual attention directly from functional brain activity.
Our model imitates the biased competition process between attention-related/neglected objects to identify and locate the visual objects in a movie frame the human brain focuses on in an unsupervised manner.
arXiv Detail & Related papers (2022-10-27T22:20:36Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Guiding Visual Attention in Deep Convolutional Neural Networks Based on
Human Eye Movements [0.0]
Deep Convolutional Neural Networks (DCNNs) were originally inspired by principles of biological vision.
Recent advances in deep learning seem to decrease this similarity.
We investigate a purely data-driven approach to obtain useful models.
arXiv Detail & Related papers (2022-06-21T17:59:23Z) - Prune and distill: similar reformatting of image information along rat
visual cortex and deep neural networks [61.60177890353585]
Deep convolutional neural networks (CNNs) have been shown to provide excellent models for its functional analogue in the brain, the ventral stream in visual cortex.
Here we consider some prominent statistical patterns that are known to exist in the internal representations of either CNNs or the visual cortex.
We show that CNNs and visual cortex share a similarly tight relationship between dimensionality expansion/reduction of object representations and reformatting of image information.
arXiv Detail & Related papers (2022-05-27T08:06:40Z) - Behind the Machine's Gaze: Biologically Constrained Neural Networks
Exhibit Human-like Visual Attention [40.878963450471026]
We propose the Neural Visual Attention (NeVA) algorithm to generate visual scanpaths in a top-down manner.
We show that the proposed method outperforms state-of-the-art unsupervised human attention models in terms of similarity to human scanpaths.
arXiv Detail & Related papers (2022-04-19T18:57:47Z) - Fooling the primate brain with minimal, targeted image manipulation [67.78919304747498]
We propose an array of methods for creating minimal, targeted image perturbations that lead to changes in both neuronal activity and perception as reflected in behavior.
Our work shares the same goal with adversarial attack, namely the manipulation of images with minimal, targeted noise that leads ANN models to misclassify the images.
arXiv Detail & Related papers (2020-11-11T08:30:54Z) - A Psychophysically Oriented Saliency Map Prediction Model [4.884688557957589]
We propose a new psychophysical saliency prediction architecture, WECSF, inspired by multi-channel model of visual cortex functioning in humans.
The proposed model is evaluated using several datasets, including the MIT1003, MIT300, Toronto, SID4VAM, and UCF Sports datasets.
Our model achieved strongly stable and better performance with different metrics on natural images, psychophysical synthetic images and dynamic videos.
arXiv Detail & Related papers (2020-11-08T20:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.