Related papers: Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

URL: http://arxiv.org/abs/2110.02797v1
Date: Wed, 6 Oct 2021 14:18:47 GMT
Title: Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs
Authors: Philipp Benz, Soomin Ham, Chaoning Zhang, Adil Karjauv, In So Kweon
Abstract summary: Convolutional Neural Networks (CNNs) have become the de facto gold standard in computer vision applications. New model architectures have been proposed challenging the status quo.
Score: 71.44985408214431
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Convolutional Neural Networks (CNNs) have become the de facto gold standard in computer vision applications in the past years. Recently, however, new model architectures have been proposed challenging the status quo. The Vision Transformer (ViT) relies solely on attention modules, while the MLP-Mixer architecture substitutes the self-attention modules with Multi-Layer Perceptrons (MLPs). Despite their great success, CNNs have been widely known to be vulnerable to adversarial attacks, causing serious concerns for security-sensitive applications. Thus, it is critical for the community to know whether the newly proposed ViT and MLP-Mixer are also vulnerable to adversarial attacks. To this end, we empirically evaluate their adversarial robustness under several adversarial attack setups and benchmark them against the widely used CNNs. Overall, we find that the two architectures, especially ViT, are more robust than their CNN models. Using a toy example, we also provide empirical evidence that the lower adversarial robustness of CNNs can be partially attributed to their shift-invariant property. Our frequency analysis suggests that the most robust ViT architectures tend to rely more on low-frequency features compared with CNNs. Additionally, we have an intriguing finding that MLP-Mixer is extremely vulnerable to universal adversarial perturbations.

Related papers

Cross-Model Transferability of Adversarial Patches in Real-time Segmentation for Autonomous Driving [0.2120527246868857]
Adrial attacks pose a significant threat to deep learning models, particularly in safety-critical applications like healthcare and autonomous driving. Recently, patch based attacks have demonstrated effectiveness in real-time inference scenarios owing to their 'drag and drop' nature. Here we propose a novel Expectation Over Transformation (EOT) based adversarial patch attack that is more realistic for autonomous vehicles.
arXiv Detail & Related papers (2025-02-22T00:03:53Z)
Query-Efficient Hard-Label Black-Box Attack against Vision Transformers [9.086983253339069]
Vision transformers (ViTs) face similar security risks from adversarial attacks as deep convolutional neural networks (CNNs) This article explores the vulnerability of ViTs against adversarial attacks under a black-box scenario. We propose a novel query-efficient hard-label adversarial attack method called AdvViT.
arXiv Detail & Related papers (2024-06-29T10:09:12Z)
Evaluating Adversarial Robustness in the Spatial Frequency Domain [13.200404022208858]
Convolutional Neural Networks (CNNs) have dominated the majority of computer vision tasks. CNNs' vulnerability to adversarial attacks has raised concerns about deploying these models to safety-critical applications. This paper presents an empirical study exploring the vulnerability of CNN models in the frequency domain.
arXiv Detail & Related papers (2024-05-10T09:20:47Z)
Robust Mixture-of-Expert Training for Convolutional Neural Networks [141.3531209949845]
Sparsely-gated Mixture of Expert (MoE) has demonstrated a great promise to enable high-accuracy and ultra-efficient model inference. We propose a new router-expert alternating Adversarial training framework for MoE, termed AdvMoE. We find that AdvMoE achieves 1% 4% adversarial robustness improvement over the original dense CNN, and enjoys the efficiency merit of sparsity-gated MoE.
arXiv Detail & Related papers (2023-08-19T20:58:21Z)
Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification [4.843654097048771]
Vision Transformers (ViT) are competing to replace Convolutional Neural Networks (CNN) for various computer vision tasks in medical imaging. Recent works have shown that ViTs are also susceptible to such attacks and suffer significant performance degradation under attack. We propose a novel self-ensembling method to enhance the robustness of ViT in the presence of adversarial attacks.
arXiv Detail & Related papers (2022-08-04T19:02:24Z)
An Impartial Take to the CNN vs Transformer Robustness Contest [89.97450887997925]
Recent state-of-the-art CNNs can be as robust and reliable or even sometimes more than the current state-of-the-art Transformers. Although it is tempting to state the definitive superiority of one family of architectures over another, they seem to enjoy similar extraordinary performances on a variety of tasks.
arXiv Detail & Related papers (2022-07-22T21:34:37Z)
Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations? [21.32962679185015]
Vision transformers (ViTs) have recently set off a new wave in neural architecture design thanks to their record-breaking performance in vision tasks. Recent works show that ViTs are more robust against adversarial attacks as compared with convolutional neural networks (CNNs) We propose a dedicated attack framework, dubbed Patch-Fool, that fools the self-attention mechanism by attacking its basic component.
arXiv Detail & Related papers (2022-03-16T04:45:59Z)
Neural Architecture Dilation for Adversarial Robustness [56.18555072877193]
A shortcoming of convolutional neural networks is that they are vulnerable to adversarial attacks. This paper aims to improve the adversarial robustness of the backbone CNNs that have a satisfactory accuracy. Under a minimal computational overhead, a dilation architecture is expected to be friendly with the standard performance of the backbone CNN.
arXiv Detail & Related papers (2021-08-16T03:58:00Z)
On the Adversarial Robustness of Visual Transformers [129.29523847765952]
This work provides the first and comprehensive study on the robustness of vision transformers (ViTs) against adversarial perturbations. Tested on various white-box and transfer attack settings, we find that ViTs possess better adversarial robustness when compared with convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-03-29T14:48:24Z)
Extreme Value Preserving Networks [65.2037926048262]
Recent evidence shows that convolutional neural networks (CNNs) are biased towards textures so that CNNs are non-robust to adversarial perturbations over textures. This paper aims to leverage good properties of SIFT to renovate CNN architectures towards better accuracy and robustness.
arXiv Detail & Related papers (2020-11-17T02:06:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.