ADAPT to Robustify Prompt Tuning Vision Transformers
- URL: http://arxiv.org/abs/2403.13196v1
- Date: Tue, 19 Mar 2024 23:13:40 GMT
- Title: ADAPT to Robustify Prompt Tuning Vision Transformers
- Authors: Masih Eskandar, Tooba Imtiaz, Zifeng Wang, Jennifer Dy,
- Abstract summary: We introduce ADAPT, a novel framework for performing adaptive adversarial training in the prompt tuning paradigm.
Our method achieves competitive robust accuracy of 40% w.r.t. SOTA robustness methods using full-model fine-tuning, by tuning only 1% of the number of parameters.
- Score: 4.462011758348954
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The performance of deep models, including Vision Transformers, is known to be vulnerable to adversarial attacks. Many existing defenses against these attacks, such as adversarial training, rely on full-model fine-tuning to induce robustness in the models. These defenses require storing a copy of the entire model, that can have billions of parameters, for each task. At the same time, parameter-efficient prompt tuning is used to adapt large transformer-based models to downstream tasks without the need to save large copies. In this paper, we examine parameter-efficient prompt tuning of Vision Transformers for downstream tasks under the lens of robustness. We show that previous adversarial defense methods, when applied to the prompt tuning paradigm, suffer from gradient obfuscation and are vulnerable to adaptive attacks. We introduce ADAPT, a novel framework for performing adaptive adversarial training in the prompt tuning paradigm. Our method achieves competitive robust accuracy of ~40% w.r.t. SOTA robustness methods using full-model fine-tuning, by tuning only ~1% of the number of parameters.
Related papers
- FullLoRA-AT: Efficiently Boosting the Robustness of Pretrained Vision
Transformers [61.48709409150777]
Vision Transformer (ViT) model has gradually become mainstream in various computer vision tasks.
Existing large models tend to prioritize performance during training, potentially neglecting the robustness.
We develop a novel LNLoRA module, incorporating a learnable layer normalization before the conventional LoRA module.
We propose the FullLoRA-AT framework by integrating the learnable LNLoRA modules into all key components of ViT-based models.
arXiv Detail & Related papers (2024-01-03T14:08:39Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - Consolidator: Mergeable Adapter with Grouped Connections for Visual
Adaptation [53.835365470800916]
We show how to efficiently and effectively transfer knowledge in a vision transformer.
We propose consolidator to modify the pre-trained model with the addition of a small set of tunable parameters.
Our consolidator can reach up to 7.56 better accuracy than full fine-tuning with merely 0.35% parameters.
arXiv Detail & Related papers (2023-04-30T23:59:02Z) - PVP: Pre-trained Visual Parameter-Efficient Tuning [29.05396521860764]
Large-scale pre-trained transformers have demonstrated remarkable success in various computer vision tasks.
It is still highly challenging to fully fine-tune these models for downstream tasks due to their high computational and storage costs.
We propose a Pre-trained Visual.
efficient (PVP) Tuning framework, which pre-trains the parameter-efficient tuning modules first and then leverages the pre-trained modules.
arXiv Detail & Related papers (2023-04-26T15:55:29Z) - Pro-tuning: Unified Prompt Tuning for Vision Tasks [133.12978197265596]
Fine-tuning is the de-facto approach to leverage pre-trained vision models to perform downstream tasks.
In this work, we propose parameter-efficient Prompt tuning (Pro-tuning) to adapt frozen vision models to various downstream vision tasks.
arXiv Detail & Related papers (2022-07-28T21:09:31Z) - Visual Prompt Tuning [74.5309408185523]
This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision.
VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen.
arXiv Detail & Related papers (2022-03-23T01:17:16Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.