Related papers: Mechanistic Understandings of Representation Vulnerabilities and Engineering Robust Vision Transformers

Mechanistic Understandings of Representation Vulnerabilities and Engineering Robust Vision Transformers

URL: http://arxiv.org/abs/2502.04679v1
Date: Fri, 07 Feb 2025 05:58:16 GMT
Title: Mechanistic Understandings of Representation Vulnerabilities and Engineering Robust Vision Transformers
Authors: Chashi Mahiul Islam, Samuel Jacob Chacko, Mao Nishino, Xiuwen Liu,
Abstract summary: We study the sources of known representation vulnerabilities of vision transformers (ViT), where perceptually identical images can have very different representations.<n>We develop NeuroShield-ViT, a novel defense mechanism that strategically neutralizes vulnerable neurons in earlier layers to prevent the cascade of adversarial effects.<n>Our results shed new light on how adversarial effects propagate through ViT layers, while providing a promising approach to enhance the robustness of vision transformers against adversarial attacks.
Score: 1.1187085721899017
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While transformer-based models dominate NLP and vision applications, their underlying mechanisms to map the input space to the label space semantically are not well understood. In this paper, we study the sources of known representation vulnerabilities of vision transformers (ViT), where perceptually identical images can have very different representations and semantically unrelated images can have the same representation. Our analysis indicates that imperceptible changes to the input can result in significant representation changes, particularly in later layers, suggesting potential instabilities in the performance of ViTs. Our comprehensive study reveals that adversarial effects, while subtle in early layers, propagate and amplify through the network, becoming most pronounced in middle to late layers. This insight motivates the development of NeuroShield-ViT, a novel defense mechanism that strategically neutralizes vulnerable neurons in earlier layers to prevent the cascade of adversarial effects. We demonstrate NeuroShield-ViT's effectiveness across various attacks, particularly excelling against strong iterative attacks, and showcase its remarkable zero-shot generalization capabilities. Without fine-tuning, our method achieves a competitive accuracy of 77.8% on adversarial examples, surpassing conventional robustness methods. Our results shed new light on how adversarial effects propagate through ViT layers, while providing a promising approach to enhance the robustness of vision transformers against adversarial attacks. Additionally, they provide a promising approach to enhance the robustness of vision transformers against adversarial attacks.

Related papers

The Sword of Damocles in ViTs: Computational Redundancy Amplifies Adversarial Transferability [38.32538271219404]
We investigate the role of computational redundancy in Vision Transformers (ViTs) and its impact on adversarial transferability. We identify two forms of redundancy, including the data-level and model-level, that can be harnessed to amplify attack effectiveness. Building on this insight, we design a suite of techniques, including attention sparsity manipulation, attention head permutation, clean token regularization, ghost MoE diversification, and test-time adversarial training.
arXiv Detail & Related papers (2025-04-15T01:59:47Z)
Protego: Detecting Adversarial Examples for Vision Transformers via Intrinsic Capabilities [21.96572543062238]
Transformer models have excelled in natural language tasks, prompting the vision community to explore their implementation in computer vision problems.<n>In this paper, we investigate the attack capabilities of six common adversarial attacks on three pretrained ViT models to reveal the vulnerability of ViT models.<n>To prevent ViT models from adversarial attack, we propose Protego, a detection framework that leverages the transformer intrinsic capabilities to detection adversarial examples.
arXiv Detail & Related papers (2025-01-13T03:54:19Z)
Transformation-Dependent Adversarial Attacks [15.374381635334897]
We introduce transformation-dependent adversarial attacks, a new class of threats where a single additive perturbation can trigger diverse, controllable mis-predictions. Unlike traditional attacks with static effects, our perturbations embed metamorphic properties to enable different adversarial attacks as a function of the transformation parameters.
arXiv Detail & Related papers (2024-06-12T17:31:36Z)
Attacking Transformers with Feature Diversity Adversarial Perturbation [19.597912600568026]
We present a label-free white-box attack approach for ViT-based models that exhibits strong transferability to various black box models. Our inspiration comes from the feature collapse phenomenon in ViTs, where the critical attention mechanism overly depends on the low-frequency component of features.
arXiv Detail & Related papers (2024-03-10T00:55:58Z)
Towards Robust Semantic Segmentation against Patch-based Attack via Attention Refinement [68.31147013783387]
We observe that the attention mechanism is vulnerable to patch-based adversarial attacks. In this paper, we propose a Robust Attention Mechanism (RAM) to improve the robustness of the semantic segmentation model.
arXiv Detail & Related papers (2024-01-03T13:58:35Z)
SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios. We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z)
Adversarial Prompt Tuning for Vision-Language Models [86.5543597406173]
Adversarial Prompt Tuning (AdvPT) is a technique to enhance the adversarial robustness of image encoders in Vision-Language Models (VLMs) We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing image-processing-based defense techniques.
arXiv Detail & Related papers (2023-11-19T07:47:43Z)
A Survey on Transferability of Adversarial Examples across Deep Neural Networks [53.04734042366312]
adversarial examples can manipulate machine learning models into making erroneous predictions. The transferability of adversarial examples enables black-box attacks which circumvent the need for detailed knowledge of the target model. This survey explores the landscape of the adversarial transferability of adversarial examples.
arXiv Detail & Related papers (2023-10-26T17:45:26Z)
Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification [4.843654097048771]
Vision Transformers (ViT) are competing to replace Convolutional Neural Networks (CNN) for various computer vision tasks in medical imaging. Recent works have shown that ViTs are also susceptible to such attacks and suffer significant performance degradation under attack. We propose a novel self-ensembling method to enhance the robustness of ViT in the presence of adversarial attacks.
arXiv Detail & Related papers (2022-08-04T19:02:24Z)
Deeper Insights into ViTs Robustness towards Common Corruptions [82.79764218627558]
We investigate how CNN-like architectural designs and CNN-based data augmentation strategies impact on ViTs' robustness towards common corruptions. We demonstrate that overlapping patch embedding and convolutional Feed-Forward Network (FFN) boost performance on robustness. We also introduce a novel conditional method enabling input-varied augmentations from two angles.
arXiv Detail & Related papers (2022-04-26T08:22:34Z)
Towards Transferable Adversarial Attacks on Vision Transformers [110.55845478440807]
Vision transformers (ViTs) have demonstrated impressive performance on a series of computer vision tasks, yet they still suffer from adversarial examples. We introduce a dual attack framework, which contains a Pay No Attention (PNA) attack and a PatchOut attack, to improve the transferability of adversarial samples across different ViTs.
arXiv Detail & Related papers (2021-09-09T11:28:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.