Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image
Classification
- URL: http://arxiv.org/abs/2208.02851v1
- Date: Thu, 4 Aug 2022 19:02:24 GMT
- Title: Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image
Classification
- Authors: Faris Almalik, Mohammad Yaqub, Karthik Nandakumar
- Abstract summary: Vision Transformers (ViT) are competing to replace Convolutional Neural Networks (CNN) for various computer vision tasks in medical imaging.
Recent works have shown that ViTs are also susceptible to such attacks and suffer significant performance degradation under attack.
We propose a novel self-ensembling method to enhance the robustness of ViT in the presence of adversarial attacks.
- Score: 4.843654097048771
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision Transformers (ViT) are competing to replace Convolutional Neural
Networks (CNN) for various computer vision tasks in medical imaging such as
classification and segmentation. While the vulnerability of CNNs to adversarial
attacks is a well-known problem, recent works have shown that ViTs are also
susceptible to such attacks and suffer significant performance degradation
under attack. The vulnerability of ViTs to carefully engineered adversarial
samples raises serious concerns about their safety in clinical settings. In
this paper, we propose a novel self-ensembling method to enhance the robustness
of ViT in the presence of adversarial attacks. The proposed Self-Ensembling
Vision Transformer (SEViT) leverages the fact that feature representations
learned by initial blocks of a ViT are relatively unaffected by adversarial
perturbations. Learning multiple classifiers based on these intermediate
feature representations and combining these predictions with that of the final
ViT classifier can provide robustness against adversarial attacks. Measuring
the consistency between the various predictions can also help detect
adversarial samples. Experiments on two modalities (chest X-ray and fundoscopy)
demonstrate the efficacy of SEViT architecture to defend against various
adversarial attacks in the gray-box (attacker has full knowledge of the target
model, but not the defense mechanism) setting. Code:
https://github.com/faresmalik/SEViT
Related papers
- Backdoor Attack Against Vision Transformers via Attention Gradient-Based Image Erosion [4.036142985883415]
Vision Transformers (ViTs) have outperformed traditional Convolutional Neural Networks (CNN) across various computer vision tasks.
ViTs are vulnerable to backdoor attacks, where an adversary embeds a backdoor into the victim model.
We propose an Attention Gradient-based Erosion Backdoor (AGEB) targeted at ViTs.
arXiv Detail & Related papers (2024-10-30T04:06:12Z) - ViTGuard: Attention-aware Detection against Adversarial Examples for Vision Transformer [8.71614629110101]
We propose ViTGuard as a general detection method for defending Vision Transformer (ViT) models against adversarial attacks.
ViTGuard uses a Masked Autoencoder (MAE) model to recover randomly masked patches from the unmasked regions.
threshold-based detectors leverage distinctive ViT features, including attention maps and classification (token representations) token representations, to distinguish between normal and adversarial samples.
arXiv Detail & Related papers (2024-09-20T18:11:56Z) - Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers [95.22517830759193]
This paper studies the transferability of such an adversarial vulnerability from a pre-trained ViT model to downstream tasks.
We show that DTA achieves an average attack success rate (ASR) exceeding 90%, surpassing existing methods by a huge margin.
arXiv Detail & Related papers (2024-08-03T08:07:03Z) - Query-Efficient Hard-Label Black-Box Attack against Vision Transformers [9.086983253339069]
Vision transformers (ViTs) face similar security risks from adversarial attacks as deep convolutional neural networks (CNNs)
This article explores the vulnerability of ViTs against adversarial attacks under a black-box scenario.
We propose a novel query-efficient hard-label adversarial attack method called AdvViT.
arXiv Detail & Related papers (2024-06-29T10:09:12Z) - Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked.
We propose an attack-agnostic defense method named Meta Invariance Defense (MID)
We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z) - Inference Time Evidences of Adversarial Attacks for Forensic on
Transformers [27.88746727644074]
Vision Transformers (ViTs) are becoming a popular paradigm for vision tasks as they achieve state-of-the-art performance on image classification.
This paper presents our first attempt toward detecting adversarial attacks during inference time using the network's input and outputs as well as latent features.
arXiv Detail & Related papers (2023-01-31T01:17:03Z) - Deeper Insights into ViTs Robustness towards Common Corruptions [82.79764218627558]
We investigate how CNN-like architectural designs and CNN-based data augmentation strategies impact on ViTs' robustness towards common corruptions.
We demonstrate that overlapping patch embedding and convolutional Feed-Forward Network (FFN) boost performance on robustness.
We also introduce a novel conditional method enabling input-varied augmentations from two angles.
arXiv Detail & Related papers (2022-04-26T08:22:34Z) - Towards Transferable Adversarial Attacks on Vision Transformers [110.55845478440807]
Vision transformers (ViTs) have demonstrated impressive performance on a series of computer vision tasks, yet they still suffer from adversarial examples.
We introduce a dual attack framework, which contains a Pay No Attention (PNA) attack and a PatchOut attack, to improve the transferability of adversarial samples across different ViTs.
arXiv Detail & Related papers (2021-09-09T11:28:25Z) - On Improving Adversarial Transferability of Vision Transformers [97.17154635766578]
Vision transformers (ViTs) process input images as sequences of patches via self-attention.
We study the adversarial feature space of ViT models and their transferability.
We introduce two novel strategies specific to the architecture of ViT models.
arXiv Detail & Related papers (2021-06-08T08:20:38Z) - On the Adversarial Robustness of Visual Transformers [129.29523847765952]
This work provides the first and comprehensive study on the robustness of vision transformers (ViTs) against adversarial perturbations.
Tested on various white-box and transfer attack settings, we find that ViTs possess better adversarial robustness when compared with convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-03-29T14:48:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.