Deeper Insights into ViTs Robustness towards Common Corruptions
- URL: http://arxiv.org/abs/2204.12143v1
- Date: Tue, 26 Apr 2022 08:22:34 GMT
- Title: Deeper Insights into ViTs Robustness towards Common Corruptions
- Authors: Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yugang Jiang
- Abstract summary: We investigate how CNN-like architectural designs and CNN-based data augmentation strategies impact on ViTs' robustness towards common corruptions.
We demonstrate that overlapping patch embedding and convolutional Feed-Forward Network (FFN) boost performance on robustness.
We also introduce a novel conditional method enabling input-varied augmentations from two angles.
- Score: 82.79764218627558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent literature have shown design strategies from Convolutions Neural
Networks (CNNs) benefit Vision Transformers (ViTs) in various vision tasks.
However, it remains unclear how these design choices impact on robustness when
transferred to ViTs. In this paper, we make the first attempt to investigate
how CNN-like architectural designs and CNN-based data augmentation strategies
impact on ViTs' robustness towards common corruptions through an extensive and
rigorous benchmarking. We demonstrate that overlapping patch embedding and
convolutional Feed-Forward Network (FFN) boost performance on robustness.
Furthermore, adversarial noise training is powerful on ViTs while
fourier-domain augmentation fails. Moreover, we introduce a novel conditional
method enabling input-varied augmentations from two angles: (1) Generating
dynamic augmentation parameters conditioned on input images. It conduces to
state-of-the-art performance on robustness through conditional convolutions;
(2) Selecting most suitable augmentation strategy by an extra predictor helps
to achieve the best trade-off between clean accuracy and robustness.
Related papers
- Denoising Vision Transformers [43.03068202384091]
We propose a two-stage denoising approach, termed Denoising Vision Transformers (DVT)
In the first stage, we separate the clean features from those contaminated by positional artifacts by enforcing cross-view feature consistency with neural fields on a per-image basis.
In the second stage, we train a lightweight transformer block to predict clean features from raw ViT outputs, leveraging the derived estimates of the clean features as supervision.
arXiv Detail & Related papers (2024-01-05T18:59:52Z) - Improving Robustness for Vision Transformer with a Simple Dynamic
Scanning Augmentation [10.27974860479791]
Vision Transformer (ViT) has demonstrated promising performance in computer vision tasks, comparable to state-of-the-art neural networks.
Yet, this new type of deep neural network architecture is vulnerable to adversarial attacks limiting its capabilities in terms of robustness.
This article presents a novel contribution aimed at further improving the accuracy and robustness of ViT, particularly in the face of adversarial attacks.
arXiv Detail & Related papers (2023-11-01T11:10:01Z) - A Light Recipe to Train Robust Vision Transformers [34.51642006926379]
We show that Vision Transformers (ViTs) can serve as an underlying architecture for improving the robustness of machine learning models against evasion attacks.
We achieve this objective using a custom adversarial training recipe, discovered using rigorous ablation studies on a subset of the ImageNet dataset.
We show that our recipe generalizes to different classes of ViT architectures and large-scale models on full ImageNet-1k.
arXiv Detail & Related papers (2022-09-15T16:00:04Z) - Coarse-to-Fine Vision Transformer [83.45020063642235]
We propose a coarse-to-fine vision transformer (CF-ViT) to relieve computational burden while retaining performance.
Our proposed CF-ViT is motivated by two important observations in modern ViT models.
Our CF-ViT reduces 53% FLOPs of LV-ViT, and also achieves 2.01x throughput.
arXiv Detail & Related papers (2022-03-08T02:57:49Z) - Towards Transferable Adversarial Attacks on Vision Transformers [110.55845478440807]
Vision transformers (ViTs) have demonstrated impressive performance on a series of computer vision tasks, yet they still suffer from adversarial examples.
We introduce a dual attack framework, which contains a Pay No Attention (PNA) attack and a PatchOut attack, to improve the transferability of adversarial samples across different ViTs.
arXiv Detail & Related papers (2021-09-09T11:28:25Z) - On Improving Adversarial Transferability of Vision Transformers [97.17154635766578]
Vision transformers (ViTs) process input images as sequences of patches via self-attention.
We study the adversarial feature space of ViT models and their transferability.
We introduce two novel strategies specific to the architecture of ViT models.
arXiv Detail & Related papers (2021-06-08T08:20:38Z) - Rethinking the Design Principles of Robust Vision Transformer [28.538786330184642]
Vision Transformers (ViT) have shown that self-attention-based networks surpassed traditional convolution neural networks (CNNs) in most vision tasks.
In this paper, we rethink the design principles of ViTs based on the robustness.
By combining the robust design components, we propose Robust Vision Transformer (RVT)
arXiv Detail & Related papers (2021-05-17T15:04:15Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z) - On the Adversarial Robustness of Visual Transformers [129.29523847765952]
This work provides the first and comprehensive study on the robustness of vision transformers (ViTs) against adversarial perturbations.
Tested on various white-box and transfer attack settings, we find that ViTs possess better adversarial robustness when compared with convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-03-29T14:48:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.