Related papers: Robustness Tokens: Towards Adversarial Robustness of Transformers

Robustness Tokens: Towards Adversarial Robustness of Transformers

URL: http://arxiv.org/abs/2503.10191v1
Date: Thu, 13 Mar 2025 09:26:19 GMT
Title: Robustness Tokens: Towards Adversarial Robustness of Transformers
Authors: Brian Pulfer, Yury Belousov, Slava Voloshynovskiy,
Abstract summary: We propose Robustness Tokens, a novel approach specific to the transformer architecture that fine-tunes a few additional private tokens with low computational requirements instead of tuning model parameters as done in traditional adversarial training.<n>We show that Robustness Tokens make Vision Transformer models significantly more robust to white-box adversarial attacks while also retaining the original downstream performances.
Score: 4.913488665159803
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, large pre-trained foundation models have become widely adopted by machine learning practitioners for a multitude of tasks. Given that such models are publicly available, relying on their use as backbone models for downstream tasks might result in high vulnerability to adversarial attacks crafted with the same public model. In this work, we propose Robustness Tokens, a novel approach specific to the transformer architecture that fine-tunes a few additional private tokens with low computational requirements instead of tuning model parameters as done in traditional adversarial training. We show that Robustness Tokens make Vision Transformer models significantly more robust to white-box adversarial attacks while also retaining the original downstream performances.

Related papers

Attacking Attention of Foundation Models Disrupts Downstream Tasks [11.538345159297839]
Foundation models are large models, trained on broad data that deliver high accuracy in many downstream tasks.<n>These models are vulnerable to adversarial attacks.<n>This paper studies the vulnerabilities of vision foundation models, focusing specifically on CLIP and ViTs.<n>We introduce a novel attack, targeting the structure of transformer-based architectures in a task-agnostic fashion.
arXiv Detail & Related papers (2025-06-03T19:42:48Z)
Adversarial Transferability in Deep Denoising Models: Theoretical Insights and Robustness Enhancement via Out-of-Distribution Typical Set Sampling [6.189440665620872]
Deep learning-based image denoising models demonstrate remarkable performance, but their lack of robustness analysis remains a significant concern.<n>A major issue is that these models are susceptible to adversarial attacks, where small, carefully crafted perturbations to input data can cause them to fail.<n>We propose a novel adversarial defense method: the Out-of-Distribution Typical Set Sampling Training strategy.
arXiv Detail & Related papers (2024-12-08T13:47:57Z)
On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models [59.45628259925441]
Volumetric medical segmentation models have achieved significant success on organ and tumor-based segmentation tasks. Their vulnerability to adversarial attacks remains largely unexplored. This underscores the importance of investigating the robustness of existing models.
arXiv Detail & Related papers (2024-06-12T17:59:42Z)
ADAPT to Robustify Prompt Tuning Vision Transformers [4.462011758348954]
We introduce ADAPT, a novel framework for performing adaptive adversarial training in the prompt tuning paradigm.<n>Our method achieves competitive robust accuracy of 40% w.r.t. SOTA robustness methods using full-model fine-tuning, by tuning only 1% of the number of parameters.
arXiv Detail & Related papers (2024-03-19T23:13:40Z)
FullLoRA: Efficiently Boosting the Robustness of Pretrained Vision Transformers [72.83770102062141]
Vision Transformer (ViT) model has gradually become mainstream in various computer vision tasks.<n>Existing large models tend to prioritize performance during training, potentially neglecting the robustness.<n>We develop novel LNLoRA module, incorporating a learnable layer normalization before the conventional LoRA module.<n>We propose the FullLoRA framework by integrating the learnable LNLoRA modules into all key components of ViT-based models.
arXiv Detail & Related papers (2024-01-03T14:08:39Z)
The Efficacy of Transformer-based Adversarial Attacks in Security Domains [0.7156877824959499]
We evaluate the robustness of transformers to adversarial samples for system defenders and their adversarial strength for system attackers. Our work emphasizes the importance of studying transformer architectures for attacking and defending models in security domains.
arXiv Detail & Related papers (2023-10-17T21:45:23Z)
Adversarial Pixel Restoration as a Pretext Task for Transferable Perturbations [54.1807206010136]
Transferable adversarial attacks optimize adversaries from a pretrained surrogate model and known label space to fool the unknown black-box models. We propose Adversarial Pixel Restoration as a self-supervised alternative to train an effective surrogate model from scratch. Our training approach is based on a min-max objective which reduces overfitting via an adversarial objective.
arXiv Detail & Related papers (2022-07-18T17:59:58Z)
Defending Variational Autoencoders from Adversarial Attacks with MCMC [74.36233246536459]
Variational autoencoders (VAEs) are deep generative models used in various domains. As previous work has shown, one can easily fool VAEs to produce unexpected latent representations and reconstructions for a visually slightly modified input. Here, we examine several objective functions for adversarial attacks construction, suggest metrics assess the model robustness, and propose a solution.
arXiv Detail & Related papers (2022-03-18T13:25:18Z)
Adversarial Token Attacks on Vision Transformers [40.687728887725356]
Vision transformers rely on a patch token based self attention mechanism, in contrast to convolutional networks. We investigate fundamental differences between these two families of models, by designing a block sparsity based adversarial token attack. We infer that transformer models are more sensitive to token attacks than convolutional models, with ResNets outperforming Transformer models by up to $sim30%$ in robust accuracy for single token attacks.
arXiv Detail & Related papers (2021-10-08T19:00:16Z)
Training Meta-Surrogate Model for Transferable Adversarial Attack [98.13178217557193]
We consider adversarial attacks to a black-box model when no queries are allowed. In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model. We show we can obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models.
arXiv Detail & Related papers (2021-09-05T03:27:46Z)
"What's in the box?!": Deflecting Adversarial Attacks by Randomly Deploying Adversarially-Disjoint Models [71.91835408379602]
adversarial examples have been long considered a real threat to machine learning models. We propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models.
arXiv Detail & Related papers (2021-02-09T20:07:13Z)
Orthogonal Deep Models As Defense Against Black-Box Attacks [71.23669614195195]
We study the inherent weakness of deep models in black-box settings where the attacker may develop the attack using a model similar to the targeted model. We introduce a novel gradient regularization scheme that encourages the internal representation of a deep model to be orthogonal to another. We verify the effectiveness of our technique on a variety of large-scale models.
arXiv Detail & Related papers (2020-06-26T08:29:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.