Common Knowledge Learning for Generating Transferable Adversarial
Examples
- URL: http://arxiv.org/abs/2307.00274v1
- Date: Sat, 1 Jul 2023 09:07:12 GMT
- Title: Common Knowledge Learning for Generating Transferable Adversarial
Examples
- Authors: Ruijie Yang, Yuanfang Guo, Junfu Wang, Jiantao Zhou and Yunhong Wang
- Abstract summary: This paper focuses on an important type of black-box attacks, where the adversary generates adversarial examples by a substitute (source) model.
Existing methods tend to give unsatisfactory adversarial transferability when the source and target models are from different types of DNN architectures.
We propose a common knowledge learning (CKL) framework to learn better network weights to generate adversarial examples.
- Score: 60.1287733223249
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper focuses on an important type of black-box attacks, i.e.,
transfer-based adversarial attacks, where the adversary generates adversarial
examples by a substitute (source) model and utilize them to attack an unseen
target model, without knowing its information. Existing methods tend to give
unsatisfactory adversarial transferability when the source and target models
are from different types of DNN architectures (e.g. ResNet-18 and Swin
Transformer). In this paper, we observe that the above phenomenon is induced by
the output inconsistency problem. To alleviate this problem while effectively
utilizing the existing DNN models, we propose a common knowledge learning (CKL)
framework to learn better network weights to generate adversarial examples with
better transferability, under fixed network architectures. Specifically, to
reduce the model-specific features and obtain better output distributions, we
construct a multi-teacher framework, where the knowledge is distilled from
different teacher architectures into one student network. By considering that
the gradient of input is usually utilized to generated adversarial examples, we
impose constraints on the gradients between the student and teacher models, to
further alleviate the output inconsistency problem and enhance the adversarial
transferability. Extensive experiments demonstrate that our proposed work can
significantly improve the adversarial transferability.
Related papers
- MOREL: Enhancing Adversarial Robustness through Multi-Objective Representation Learning [1.534667887016089]
deep neural networks (DNNs) are vulnerable to slight adversarial perturbations.
We show that strong feature representation learning during training can significantly enhance the original model's robustness.
We propose MOREL, a multi-objective feature representation learning approach, encouraging classification models to produce similar features for inputs within the same class, despite perturbations.
arXiv Detail & Related papers (2024-10-02T16:05:03Z) - Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks [42.18755809782401]
We propose a novel transfer attack method called PDCL-Attack.
We formulate an effective prompt-driven feature guidance by harnessing the semantic representation power of text.
arXiv Detail & Related papers (2024-07-30T08:52:16Z) - A Survey on Transferability of Adversarial Examples across Deep Neural Networks [53.04734042366312]
adversarial examples can manipulate machine learning models into making erroneous predictions.
The transferability of adversarial examples enables black-box attacks which circumvent the need for detailed knowledge of the target model.
This survey explores the landscape of the adversarial transferability of adversarial examples.
arXiv Detail & Related papers (2023-10-26T17:45:26Z) - CT-GAT: Cross-Task Generative Adversarial Attack based on
Transferability [24.272384832200522]
We propose a novel approach that directly constructs adversarial examples by extracting transferable features across various tasks.
Specifically, we train a sequence-to-sequence generative model named CT-GAT using adversarial sample data collected from multiple tasks to acquire universal adversarial features.
Results demonstrate that our method achieves superior attack performance with small cost.
arXiv Detail & Related papers (2023-10-22T11:00:04Z) - Why Does Little Robustness Help? Understanding and Improving Adversarial
Transferability from Surrogate Training [24.376314203167016]
Adversarial examples (AEs) for DNNs have been shown to be transferable.
In this paper, we take a further step towards understanding adversarial transferability.
arXiv Detail & Related papers (2023-07-15T19:20:49Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer
Learning [60.784641458579124]
We show that fine-tuning effectively enhances model robustness under white-box FGSM attacks.
We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model.
To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model.
arXiv Detail & Related papers (2020-08-25T15:04:32Z) - TREND: Transferability based Robust ENsemble Design [6.663641564969944]
We study the effect of network architecture, input, weight and activation quantization on transferability of adversarial samples.
We show that transferability is significantly hampered by input quantization between source and target.
We propose a new state-of-the-art ensemble attack to combat this.
arXiv Detail & Related papers (2020-08-04T13:38:14Z) - Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model.
We propose to exploit additional information from the feature space to craft stronger adversaries.
Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z) - Transferable Perturbations of Deep Feature Distributions [102.94094966908916]
This work presents a new adversarial attack based on the modeling and exploitation of class-wise and layer-wise deep feature distributions.
We achieve state-of-the-art targeted blackbox transfer-based attack results for undefended ImageNet models.
arXiv Detail & Related papers (2020-04-27T00:32:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.