Pre-trained Adversarial Perturbations
- URL: http://arxiv.org/abs/2210.03372v1
- Date: Fri, 7 Oct 2022 07:28:03 GMT
- Title: Pre-trained Adversarial Perturbations
- Authors: Yuanhao Ban, Yinpeng Dong
- Abstract summary: Pre-trained Adversarial Perturbations (PAPs) are universal perturbations crafted for the pre-trained models to maintain the effectiveness when attacking fine-tuned ones.
We propose a Low-Level Layer Lifting Attack (L4A) method to generate effective PAPs by lifting the neuron activations of low-level layers of the pre-trained models.
Experiments on typical pre-trained vision models and ten downstream tasks demonstrate that our method improves the attack success rate by a large margin compared with state-of-the-art methods.
- Score: 16.95886568770364
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Self-supervised pre-training has drawn increasing attention in recent years
due to its superior performance on numerous downstream tasks after fine-tuning.
However, it is well-known that deep learning models lack the robustness to
adversarial examples, which can also invoke security issues to pre-trained
models, despite being less explored. In this paper, we delve into the
robustness of pre-trained models by introducing Pre-trained Adversarial
Perturbations (PAPs), which are universal perturbations crafted for the
pre-trained models to maintain the effectiveness when attacking fine-tuned ones
without any knowledge of the downstream tasks. To this end, we propose a
Low-Level Layer Lifting Attack (L4A) method to generate effective PAPs by
lifting the neuron activations of low-level layers of the pre-trained models.
Equipped with an enhanced noise augmentation strategy, L4A is effective at
generating more transferable PAPs against fine-tuned models. Extensive
experiments on typical pre-trained vision models and ten downstream tasks
demonstrate that our method improves the attack success rate by a large margin
compared with state-of-the-art methods.
Related papers
- SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers [95.22517830759193]
This paper studies the transferability of such an adversarial vulnerability from a pre-trained ViT model to downstream tasks.
We show that DTA achieves an average attack success rate (ASR) exceeding 90%, surpassing existing methods by a huge margin.
arXiv Detail & Related papers (2024-08-03T08:07:03Z) - SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models [19.41370590658815]
Powerful pre-trained models may be misused for unethical or illegal tasks.
We introduce a pioneering learning paradigm, non-fine-tunable learning, which prevents the pre-trained model from being fine-tuned to indecent tasks.
We propose SOPHON, a protection framework that reinforces a given pre-trained model to be resistant to being fine-tuned in pre-defined restricted domains.
arXiv Detail & Related papers (2024-04-19T08:07:26Z) - Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness [52.9493817508055]
We propose Pre-trained Model Guided Adversarial Fine-Tuning (PMG-AFT) to enhance the model's zero-shot adversarial robustness.
Our approach consistently improves clean accuracy by an average of 8.72%.
arXiv Detail & Related papers (2024-01-09T04:33:03Z) - Learn from the Past: A Proxy Guided Adversarial Defense Framework with
Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models.
AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting.
We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Memorization in NLP Fine-tuning Methods [34.66743495192471]
We empirically study memorization of fine-tuning methods using membership inference and extraction attacks.
Fine-tuning the head of the model has the highest susceptibility to attacks, whereas fine-tuning smaller adapters appears to be less vulnerable to known extraction attacks.
arXiv Detail & Related papers (2022-05-25T05:49:31Z) - A Prompting-based Approach for Adversarial Example Generation and
Robustness Enhancement [18.532308729844598]
We propose a novel prompt-based adversarial attack to compromise NLP models.
We generate adversarial examples via mask-and-filling under the effect of a malicious purpose.
Our training method does not actually generate adversarial samples, it can be applied to large-scale training sets efficiently.
arXiv Detail & Related papers (2022-03-21T03:21:32Z) - Efficient Adversarial Training with Transferable Adversarial Examples [58.62766224452761]
We show that there is high transferability between models from neighboring epochs in the same training process.
We propose a novel method, Adversarial Training with Transferable Adversarial Examples (ATTA) that can enhance the robustness of trained models.
arXiv Detail & Related papers (2019-12-27T03:05:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.