Breaking the Illusion of Security via Interpretation: Interpretable Vision Transformer Systems under Attack
- URL: http://arxiv.org/abs/2507.14248v1
- Date: Fri, 18 Jul 2025 05:11:11 GMT
- Title: Breaking the Illusion of Security via Interpretation: Interpretable Vision Transformer Systems under Attack
- Authors: Eldor Abdukhamidov, Mohammed Abuhamad, Simon S. Woo, Hyoungshick Kim, Tamer Abuhmed,
- Abstract summary: Vision transformer (ViT) models, when coupled with interpretation models, are regarded as secure and challenging to deceive.<n>This study investigates the vulnerability of transformer models to adversarial attacks, even when combined with interpretation models.<n>We propose an attack called "AdViT" that generates adversarial examples capable of misleading both a given transformer model and its coupled interpretation model.
- Score: 23.690939122119723
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Vision transformer (ViT) models, when coupled with interpretation models, are regarded as secure and challenging to deceive, making them well-suited for security-critical domains such as medical applications, autonomous vehicles, drones, and robotics. However, successful attacks on these systems can lead to severe consequences. Recent research on threats targeting ViT models primarily focuses on generating the smallest adversarial perturbations that can deceive the models with high confidence, without considering their impact on model interpretations. Nevertheless, the use of interpretation models can effectively assist in detecting adversarial examples. This study investigates the vulnerability of transformer models to adversarial attacks, even when combined with interpretation models. We propose an attack called "AdViT" that generates adversarial examples capable of misleading both a given transformer model and its coupled interpretation model. Through extensive experiments on various transformer models and two transformer-based interpreters, we demonstrate that AdViT achieves a 100% attack success rate in both white-box and black-box scenarios. In white-box scenarios, it reaches up to 98% misclassification confidence, while in black-box scenarios, it reaches up to 76% misclassification confidence. Remarkably, AdViT consistently generates accurate interpretations in both scenarios, making the adversarial examples more difficult to detect.
Related papers
- Protego: Detecting Adversarial Examples for Vision Transformers via Intrinsic Capabilities [21.96572543062238]
Transformer models have excelled in natural language tasks, prompting the vision community to explore their implementation in computer vision problems.<n>In this paper, we investigate the attack capabilities of six common adversarial attacks on three pretrained ViT models to reveal the vulnerability of ViT models.<n>To prevent ViT models from adversarial attack, we propose Protego, a detection framework that leverages the transformer intrinsic capabilities to detection adversarial examples.
arXiv Detail & Related papers (2025-01-13T03:54:19Z) - ViTGuard: Attention-aware Detection against Adversarial Examples for Vision Transformer [8.71614629110101]
We propose ViTGuard as a general detection method for defending Vision Transformer (ViT) models against adversarial attacks.
ViTGuard uses a Masked Autoencoder (MAE) model to recover randomly masked patches from the unmasked regions.
threshold-based detectors leverage distinctive ViT features, including attention maps and classification (token representations) token representations, to distinguish between normal and adversarial samples.
arXiv Detail & Related papers (2024-09-20T18:11:56Z) - Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers [95.22517830759193]
This paper studies the transferability of such an adversarial vulnerability from a pre-trained ViT model to downstream tasks.
We show that DTA achieves an average attack success rate (ASR) exceeding 90%, surpassing existing methods by a huge margin.
arXiv Detail & Related papers (2024-08-03T08:07:03Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - The Efficacy of Transformer-based Adversarial Attacks in Security
Domains [0.7156877824959499]
We evaluate the robustness of transformers to adversarial samples for system defenders and their adversarial strength for system attackers.
Our work emphasizes the importance of studying transformer architectures for attacking and defending models in security domains.
arXiv Detail & Related papers (2023-10-17T21:45:23Z) - Set-level Guidance Attack: Boosting Adversarial Transferability of
Vision-Language Pre-training Models [52.530286579915284]
We present the first study to investigate the adversarial transferability of vision-language pre-training models.
The transferability degradation is partly caused by the under-utilization of cross-modal interactions.
We propose a highly transferable Set-level Guidance Attack (SGA) that thoroughly leverages modality interactions and incorporates alignment-preserving augmentation with cross-modal guidance.
arXiv Detail & Related papers (2023-07-26T09:19:21Z) - Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image
Classification [4.843654097048771]
Vision Transformers (ViT) are competing to replace Convolutional Neural Networks (CNN) for various computer vision tasks in medical imaging.
Recent works have shown that ViTs are also susceptible to such attacks and suffer significant performance degradation under attack.
We propose a novel self-ensembling method to enhance the robustness of ViT in the presence of adversarial attacks.
arXiv Detail & Related papers (2022-08-04T19:02:24Z) - Cross-Modal Transferable Adversarial Attacks from Images to Videos [82.0745476838865]
Recent studies have shown that adversarial examples hand-crafted on one white-box model can be used to attack other black-box models.
We propose a simple yet effective cross-modal attack method, named as Image To Video (I2V) attack.
I2V generates adversarial frames by minimizing the cosine similarity between features of pre-trained image models from adversarial and benign examples.
arXiv Detail & Related papers (2021-12-10T08:19:03Z) - Boosting the Transferability of Video Adversarial Examples via Temporal
Translation [82.0745476838865]
adversarial examples are transferable, which makes them feasible for black-box attacks in real-world applications.
We introduce a temporal translation attack method, which optimize the adversarial perturbations over a set of temporal translated video clips.
Experiments on the Kinetics-400 dataset and the UCF-101 dataset demonstrate that our method can significantly boost the transferability of video adversarial examples.
arXiv Detail & Related papers (2021-10-18T07:52:17Z) - Towards Transferable Adversarial Attacks on Vision Transformers [110.55845478440807]
Vision transformers (ViTs) have demonstrated impressive performance on a series of computer vision tasks, yet they still suffer from adversarial examples.
We introduce a dual attack framework, which contains a Pay No Attention (PNA) attack and a PatchOut attack, to improve the transferability of adversarial samples across different ViTs.
arXiv Detail & Related papers (2021-09-09T11:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.