Improving Robustness for Vision Transformer with a Simple Dynamic
Scanning Augmentation
- URL: http://arxiv.org/abs/2311.00441v1
- Date: Wed, 1 Nov 2023 11:10:01 GMT
- Title: Improving Robustness for Vision Transformer with a Simple Dynamic
Scanning Augmentation
- Authors: Shashank Kotyan and Danilo Vasconcellos Vargas
- Abstract summary: Vision Transformer (ViT) has demonstrated promising performance in computer vision tasks, comparable to state-of-the-art neural networks.
Yet, this new type of deep neural network architecture is vulnerable to adversarial attacks limiting its capabilities in terms of robustness.
This article presents a novel contribution aimed at further improving the accuracy and robustness of ViT, particularly in the face of adversarial attacks.
- Score: 10.27974860479791
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision Transformer (ViT) has demonstrated promising performance in computer
vision tasks, comparable to state-of-the-art neural networks. Yet, this new
type of deep neural network architecture is vulnerable to adversarial attacks
limiting its capabilities in terms of robustness. This article presents a novel
contribution aimed at further improving the accuracy and robustness of ViT,
particularly in the face of adversarial attacks. We propose an augmentation
technique called `Dynamic Scanning Augmentation' that leverages dynamic input
sequences to adaptively focus on different patches, thereby maintaining
performance and robustness. Our detailed investigations reveal that this
adaptability to the input sequence induces significant changes in the attention
mechanism of ViT, even for the same image. We introduce four variations of
Dynamic Scanning Augmentation, outperforming ViT in terms of both robustness to
adversarial attacks and accuracy against natural images, with one variant
showing comparable results. By integrating our augmentation technique, we
observe a substantial increase in ViT's robustness, improving it from $17\%$ to
$92\%$ measured across different types of adversarial attacks. These findings,
together with other comprehensive tests, indicate that Dynamic Scanning
Augmentation enhances accuracy and robustness by promoting a more adaptive type
of attention. In conclusion, this work contributes to the ongoing research on
Vision Transformers by introducing Dynamic Scanning Augmentation as a technique
for improving the accuracy and robustness of ViT. The observed results
highlight the potential of this approach in advancing computer vision tasks and
merit further exploration in future studies.
Related papers
- ChangeViT: Unleashing Plain Vision Transformers for Change Detection [3.582733645632794]
ChangeViT is a framework that adopts a plain ViT backbone to enhance the performance of large-scale changes.
The framework achieves state-of-the-art performance on three popular high-resolution datasets.
arXiv Detail & Related papers (2024-06-18T17:59:08Z) - Attacking Transformers with Feature Diversity Adversarial Perturbation [19.597912600568026]
We present a label-free white-box attack approach for ViT-based models that exhibits strong transferability to various black box models.
Our inspiration comes from the feature collapse phenomenon in ViTs, where the critical attention mechanism overly depends on the low-frequency component of features.
arXiv Detail & Related papers (2024-03-10T00:55:58Z) - Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image
Classification [4.843654097048771]
Vision Transformers (ViT) are competing to replace Convolutional Neural Networks (CNN) for various computer vision tasks in medical imaging.
Recent works have shown that ViTs are also susceptible to such attacks and suffer significant performance degradation under attack.
We propose a novel self-ensembling method to enhance the robustness of ViT in the presence of adversarial attacks.
arXiv Detail & Related papers (2022-08-04T19:02:24Z) - Deeper Insights into ViTs Robustness towards Common Corruptions [82.79764218627558]
We investigate how CNN-like architectural designs and CNN-based data augmentation strategies impact on ViTs' robustness towards common corruptions.
We demonstrate that overlapping patch embedding and convolutional Feed-Forward Network (FFN) boost performance on robustness.
We also introduce a novel conditional method enabling input-varied augmentations from two angles.
arXiv Detail & Related papers (2022-04-26T08:22:34Z) - Visualizing and Understanding Patch Interactions in Vision Transformer [96.70401478061076]
Vision Transformer (ViT) has become a leading tool in various computer vision tasks.
We propose a novel explainable visualization approach to analyze and interpret the crucial attention interactions among patches for vision transformer.
arXiv Detail & Related papers (2022-03-11T13:48:11Z) - Video Coding for Machine: Compact Visual Representation Compression for
Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression.
This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z) - Intriguing Properties of Vision Transformers [114.28522466830374]
Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems.
We systematically study this question via an extensive set of experiments and comparisons with a high-performing convolutional neural network (CNN)
We show effective features of ViTs are due to flexible receptive and dynamic fields possible via the self-attention mechanism.
arXiv Detail & Related papers (2021-05-21T17:59:18Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z) - On the Adversarial Robustness of Visual Transformers [129.29523847765952]
This work provides the first and comprehensive study on the robustness of vision transformers (ViTs) against adversarial perturbations.
Tested on various white-box and transfer attack settings, we find that ViTs possess better adversarial robustness when compared with convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-03-29T14:48:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.