Related papers: An Extended Study of Human-like Behavior under Adversarial Training

An Extended Study of Human-like Behavior under Adversarial Training

URL: http://arxiv.org/abs/2303.12669v1
Date: Wed, 22 Mar 2023 15:47:16 GMT
Title: An Extended Study of Human-like Behavior under Adversarial Training
Authors: Paul Gavrikov, Janis Keuper, Margret Keuper
Abstract summary: We show that adversarial training increases the shift toward shape bias in neural networks. We also provide a possible explanation for this phenomenon from a frequency perspective.
Score: 11.72025865314187
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural networks have a number of shortcomings. Amongst the severest ones is the sensitivity to distribution shifts which allows models to be easily fooled into wrong predictions by small perturbations to inputs that are often imperceivable to humans and do not have to carry semantic meaning. Adversarial training poses a partial solution to address this issue by training models on worst-case perturbations. Yet, recent work has also pointed out that the reasoning in neural networks is different from humans. Humans identify objects by shape, while neural nets mainly employ texture cues. Exemplarily, a model trained on photographs will likely fail to generalize to datasets containing sketches. Interestingly, it was also shown that adversarial training seems to favorably increase the shift toward shape bias. In this work, we revisit this observation and provide an extensive analysis of this effect on various architectures, the common $\ell_2$- and $\ell_\infty$-training, and Transformer-based models. Further, we provide a possible explanation for this phenomenon from a frequency perspective.

Related papers

Can Biases in ImageNet Models Explain Generalization? [13.802802975822704]
Generalization is one of the major challenges of current deep learning methods. For image classification, this manifests in the existence of adversarial attacks, the performance drops on distorted images, and a lack of generalization to concepts such as sketches. We perform a large-scale study on 48 ImageNet models obtained via different training methods to understand how and if these biases interact with generalization.
arXiv Detail & Related papers (2024-04-01T22:25:48Z)
A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization. Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z)
MENTOR: Human Perception-Guided Pretraining for Increased Generalization [5.596752018167751]
We introduce MENTOR (huMan pErceptioN-guided preTraining fOr increased geneRalization) We train an autoencoder to learn human saliency maps given an input image, without class labels. We remove the decoder part, add a classification layer on top of the encoder, and fine-tune this new model conventionally.
arXiv Detail & Related papers (2023-10-30T13:50:44Z)
Connecting metrics for shape-texture knowledge in computer vision [1.7785095623975342]
Deep neural networks remain brittle and susceptible to many changes in the image that do not cause humans to misclassify images. Part of this different behavior may be explained by the type of features humans and deep neural networks use in vision tasks.
arXiv Detail & Related papers (2023-01-25T14:37:42Z)
Benign Overfitting in Two-layer Convolutional Neural Networks [90.75603889605043]
We study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN) We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss. On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve constant level test loss.
arXiv Detail & Related papers (2022-02-14T07:45:51Z)
Explainable Adversarial Attacks in Deep Neural Networks Using Activation Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples. We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z)
Predictive coding feedback results in perceived illusory contours in a recurrent neural network [0.0]
We equip a deep feedforward convolutional network with brain-inspired recurrent dynamics. We show that the perception of illusory contours could involve feedback connections.
arXiv Detail & Related papers (2021-02-03T09:07:09Z)
Explain by Evidence: An Explainable Memory-based Neural Network for Question Answering [41.73026155036886]
This paper proposes an explainable, evidence-based memory network architecture. It learns to summarize the dataset and extract supporting evidences to make its decision. Our model achieves state-of-the-art performance on two popular question answering datasets.
arXiv Detail & Related papers (2020-11-05T21:18:21Z)
A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood. We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z)
Stereopagnosia: Fooling Stereo Networks with Adversarial Perturbations [71.00754846434744]
We show that imperceptible additive perturbations can significantly alter the disparity map. We show that, when used for adversarial data augmentation, our perturbations result in trained models that are more robust.
arXiv Detail & Related papers (2020-09-21T19:20:09Z)
Encoding Robustness to Image Style via Adversarial Feature Perturbations [72.81911076841408]
We adapt adversarial training by directly perturbing feature statistics, rather than image pixels, to produce robust models. Our proposed method, Adversarial Batch Normalization (AdvBN), is a single network layer that generates worst-case feature perturbations during training.
arXiv Detail & Related papers (2020-09-18T17:52:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.