Related papers: Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification

Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification

URL: http://arxiv.org/abs/2208.02851v1
Date: Thu, 4 Aug 2022 19:02:24 GMT
Title: Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification
Authors: Faris Almalik, Mohammad Yaqub, Karthik Nandakumar
Abstract summary: Vision Transformers (ViT) are competing to replace Convolutional Neural Networks (CNN) for various computer vision tasks in medical imaging. Recent works have shown that ViTs are also susceptible to such attacks and suffer significant performance degradation under attack. We propose a novel self-ensembling method to enhance the robustness of ViT in the presence of adversarial attacks.
Score: 4.843654097048771
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision Transformers (ViT) are competing to replace Convolutional Neural Networks (CNN) for various computer vision tasks in medical imaging such as classification and segmentation. While the vulnerability of CNNs to adversarial attacks is a well-known problem, recent works have shown that ViTs are also susceptible to such attacks and suffer significant performance degradation under attack. The vulnerability of ViTs to carefully engineered adversarial samples raises serious concerns about their safety in clinical settings. In this paper, we propose a novel self-ensembling method to enhance the robustness of ViT in the presence of adversarial attacks. The proposed Self-Ensembling Vision Transformer (SEViT) leverages the fact that feature representations learned by initial blocks of a ViT are relatively unaffected by adversarial perturbations. Learning multiple classifiers based on these intermediate feature representations and combining these predictions with that of the final ViT classifier can provide robustness against adversarial attacks. Measuring the consistency between the various predictions can also help detect adversarial samples. Experiments on two modalities (chest X-ray and fundoscopy) demonstrate the efficacy of SEViT architecture to defend against various adversarial attacks in the gray-box (attacker has full knowledge of the target model, but not the defense mechanism) setting. Code: https://github.com/faresmalik/SEViT

Related papers

Mechanistic Understandings of Representation Vulnerabilities and Engineering Robust Vision Transformers [1.1187085721899017]
We study the sources of known representation vulnerabilities of vision transformers (ViT), where perceptually identical images can have very different representations. We develop NeuroShield-ViT, a novel defense mechanism that strategically neutralizes vulnerable neurons in earlier layers to prevent the cascade of adversarial effects. Our results shed new light on how adversarial effects propagate through ViT layers, while providing a promising approach to enhance the robustness of vision transformers against adversarial attacks.
arXiv Detail & Related papers (2025-02-07T05:58:16Z)
Protego: Detecting Adversarial Examples for Vision Transformers via Intrinsic Capabilities [21.96572543062238]
Transformer models have excelled in natural language tasks, prompting the vision community to explore their implementation in computer vision problems. In this paper, we investigate the attack capabilities of six common adversarial attacks on three pretrained ViT models to reveal the vulnerability of ViT models. To prevent ViT models from adversarial attack, we propose Protego, a detection framework that leverages the transformer intrinsic capabilities to detection adversarial examples.
arXiv Detail & Related papers (2025-01-13T03:54:19Z)
Backdoor Attack Against Vision Transformers via Attention Gradient-Based Image Erosion [4.036142985883415]
Vision Transformers (ViTs) have outperformed traditional Convolutional Neural Networks (CNN) across various computer vision tasks. ViTs are vulnerable to backdoor attacks, where an adversary embeds a backdoor into the victim model. We propose an Attention Gradient-based Erosion Backdoor (AGEB) targeted at ViTs.
arXiv Detail & Related papers (2024-10-30T04:06:12Z)
ViTGuard: Attention-aware Detection against Adversarial Examples for Vision Transformer [8.71614629110101]
We propose ViTGuard as a general detection method for defending Vision Transformer (ViT) models against adversarial attacks. ViTGuard uses a Masked Autoencoder (MAE) model to recover randomly masked patches from the unmasked regions. threshold-based detectors leverage distinctive ViT features, including attention maps and classification (token representations) token representations, to distinguish between normal and adversarial samples.
arXiv Detail & Related papers (2024-09-20T18:11:56Z)
Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers [95.22517830759193]
This paper studies the transferability of such an adversarial vulnerability from a pre-trained ViT model to downstream tasks. We show that DTA achieves an average attack success rate (ASR) exceeding 90%, surpassing existing methods by a huge margin.
arXiv Detail & Related papers (2024-08-03T08:07:03Z)
Query-Efficient Hard-Label Black-Box Attack against Vision Transformers [9.086983253339069]
Vision transformers (ViTs) face similar security risks from adversarial attacks as deep convolutional neural networks (CNNs) This article explores the vulnerability of ViTs against adversarial attacks under a black-box scenario. We propose a novel query-efficient hard-label adversarial attack method called AdvViT.
arXiv Detail & Related papers (2024-06-29T10:09:12Z)
Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked. We propose an attack-agnostic defense method named Meta Invariance Defense (MID) We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z)
Inference Time Evidences of Adversarial Attacks for Forensic on Transformers [27.88746727644074]
Vision Transformers (ViTs) are becoming a popular paradigm for vision tasks as they achieve state-of-the-art performance on image classification. This paper presents our first attempt toward detecting adversarial attacks during inference time using the network's input and outputs as well as latent features.
arXiv Detail & Related papers (2023-01-31T01:17:03Z)
Deeper Insights into ViTs Robustness towards Common Corruptions [82.79764218627558]
We investigate how CNN-like architectural designs and CNN-based data augmentation strategies impact on ViTs' robustness towards common corruptions. We demonstrate that overlapping patch embedding and convolutional Feed-Forward Network (FFN) boost performance on robustness. We also introduce a novel conditional method enabling input-varied augmentations from two angles.
arXiv Detail & Related papers (2022-04-26T08:22:34Z)
Towards Transferable Adversarial Attacks on Vision Transformers [110.55845478440807]
Vision transformers (ViTs) have demonstrated impressive performance on a series of computer vision tasks, yet they still suffer from adversarial examples. We introduce a dual attack framework, which contains a Pay No Attention (PNA) attack and a PatchOut attack, to improve the transferability of adversarial samples across different ViTs.
arXiv Detail & Related papers (2021-09-09T11:28:25Z)
On Improving Adversarial Transferability of Vision Transformers [97.17154635766578]
Vision transformers (ViTs) process input images as sequences of patches via self-attention. We study the adversarial feature space of ViT models and their transferability. We introduce two novel strategies specific to the architecture of ViT models.
arXiv Detail & Related papers (2021-06-08T08:20:38Z)
On the Adversarial Robustness of Visual Transformers [129.29523847765952]
This work provides the first and comprehensive study on the robustness of vision transformers (ViTs) against adversarial perturbations. Tested on various white-box and transfer attack settings, we find that ViTs possess better adversarial robustness when compared with convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-03-29T14:48:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.