An Impartial Take to the CNN vs Transformer Robustness Contest
- URL: http://arxiv.org/abs/2207.11347v1
- Date: Fri, 22 Jul 2022 21:34:37 GMT
- Title: An Impartial Take to the CNN vs Transformer Robustness Contest
- Authors: Francesco Pinto, Philip H.S. Torr, Puneet K. Dokania
- Abstract summary: Recent state-of-the-art CNNs can be as robust and reliable or even sometimes more than the current state-of-the-art Transformers.
Although it is tempting to state the definitive superiority of one family of architectures over another, they seem to enjoy similar extraordinary performances on a variety of tasks.
- Score: 89.97450887997925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Following the surge of popularity of Transformers in Computer Vision, several
studies have attempted to determine whether they could be more robust to
distribution shifts and provide better uncertainty estimates than Convolutional
Neural Networks (CNNs). The almost unanimous conclusion is that they are, and
it is often conjectured more or less explicitly that the reason of this
supposed superiority is to be attributed to the self-attention mechanism. In
this paper we perform extensive empirical analyses showing that recent
state-of-the-art CNNs (particularly, ConvNeXt) can be as robust and reliable or
even sometimes more than the current state-of-the-art Transformers. However,
there is no clear winner. Therefore, although it is tempting to state the
definitive superiority of one family of architectures over another, they seem
to enjoy similar extraordinary performances on a variety of tasks while also
suffering from similar vulnerabilities such as texture, background, and
simplicity biases.
Related papers
- Biased Attention: Do Vision Transformers Amplify Gender Bias More than
Convolutional Neural Networks? [2.8391805742728553]
Deep neural networks used in computer vision have been shown to exhibit many social biases such as gender bias.
Vision Transformers (ViTs) have become increasingly popular in computer vision applications, outperforming Convolutional Neural Networks (CNNs) in many tasks such as image classification.
This research found that ViTs amplified gender bias to a greater extent than CNNs.
arXiv Detail & Related papers (2023-09-15T20:59:12Z) - Finding Differences Between Transformers and ConvNets Using
Counterfactual Simulation Testing [82.67716657524251]
We present a counterfactual framework that allows us to study the robustness of neural networks with respect to naturalistic variations.
Our method allows for a fair comparison of the robustness of recently released, state-of-the-art Convolutional Neural Networks and Vision Transformers.
arXiv Detail & Related papers (2022-11-29T18:59:23Z) - Can CNNs Be More Robust Than Transformers? [29.615791409258804]
Vision Transformers is shaking the long dominance of Convolutional Neural Networks (CNNs) in image recognition for a decade.
Recent research finds that Transformers are inherently more robust than CNNs, regardless of different training setups.
It is believed that such superiority of Transformers should largely be credited to their self-attention-like architectures per se.
arXiv Detail & Related papers (2022-06-07T17:17:07Z) - Are Transformers More Robust Than CNNs? [17.47001041042089]
We provide the first fair & in-depth comparisons between Transformers and CNNs.
CNNs can easily be as robust as Transformers on defending against adversarial attacks.
Our ablations suggest such stronger generalization is largely benefited by the Transformer's self-attention-like architectures.
arXiv Detail & Related papers (2021-11-10T00:18:59Z) - Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to
CNNs [71.44985408214431]
Convolutional Neural Networks (CNNs) have become the de facto gold standard in computer vision applications.
New model architectures have been proposed challenging the status quo.
arXiv Detail & Related papers (2021-10-06T14:18:47Z) - IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision
Transformers [81.31885548824926]
Self-attention-based model, transformer, is recently becoming the leading backbone in the field of computer vision.
We present an Interpretability-Aware REDundancy REDuction framework (IA-RED$2$)
We include extensive experiments on both image and video tasks, where our method could deliver up to 1.4X speed-up.
arXiv Detail & Related papers (2021-06-23T18:29:23Z) - On the Robustness of Vision Transformers to Adversarial Examples [7.627299398469961]
We study the robustness of Vision Transformers to adversarial examples.
We show that adversarial examples do not readily transfer between CNNs and transformers.
Under a black-box adversary, we show that an ensemble can achieve unprecedented robustness without sacrificing clean accuracy.
arXiv Detail & Related papers (2021-03-31T00:29:12Z) - Detecting Adversarial Examples by Input Transformations, Defense
Perturbations, and Voting [71.57324258813674]
convolutional neural networks (CNNs) have proved to reach super-human performance in visual recognition tasks.
CNNs can easily be fooled by adversarial examples, i.e., maliciously-crafted images that force the networks to predict an incorrect output.
This paper extensively explores the detection of adversarial examples via image transformations and proposes a novel methodology.
arXiv Detail & Related papers (2021-01-27T14:50:41Z) - Extreme Value Preserving Networks [65.2037926048262]
Recent evidence shows that convolutional neural networks (CNNs) are biased towards textures so that CNNs are non-robust to adversarial perturbations over textures.
This paper aims to leverage good properties of SIFT to renovate CNN architectures towards better accuracy and robustness.
arXiv Detail & Related papers (2020-11-17T02:06:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.