Related papers: Common Knowledge Learning for Generating Transferable Adversarial Examples

Common Knowledge Learning for Generating Transferable Adversarial Examples

URL: http://arxiv.org/abs/2307.00274v1
Date: Sat, 1 Jul 2023 09:07:12 GMT
Title: Common Knowledge Learning for Generating Transferable Adversarial Examples
Authors: Ruijie Yang, Yuanfang Guo, Junfu Wang, Jiantao Zhou and Yunhong Wang
Abstract summary: This paper focuses on an important type of black-box attacks, where the adversary generates adversarial examples by a substitute (source) model. Existing methods tend to give unsatisfactory adversarial transferability when the source and target models are from different types of DNN architectures. We propose a common knowledge learning (CKL) framework to learn better network weights to generate adversarial examples.
Score: 60.1287733223249
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper focuses on an important type of black-box attacks, i.e., transfer-based adversarial attacks, where the adversary generates adversarial examples by a substitute (source) model and utilize them to attack an unseen target model, without knowing its information. Existing methods tend to give unsatisfactory adversarial transferability when the source and target models are from different types of DNN architectures (e.g. ResNet-18 and Swin Transformer). In this paper, we observe that the above phenomenon is induced by the output inconsistency problem. To alleviate this problem while effectively utilizing the existing DNN models, we propose a common knowledge learning (CKL) framework to learn better network weights to generate adversarial examples with better transferability, under fixed network architectures. Specifically, to reduce the model-specific features and obtain better output distributions, we construct a multi-teacher framework, where the knowledge is distilled from different teacher architectures into one student network. By considering that the gradient of input is usually utilized to generated adversarial examples, we impose constraints on the gradients between the student and teacher models, to further alleviate the output inconsistency problem and enhance the adversarial transferability. Extensive experiments demonstrate that our proposed work can significantly improve the adversarial transferability.

Related papers

MOREL: Enhancing Adversarial Robustness through Multi-Objective Representation Learning [1.534667887016089]
deep neural networks (DNNs) are vulnerable to slight adversarial perturbations. We show that strong feature representation learning during training can significantly enhance the original model's robustness. We propose MOREL, a multi-objective feature representation learning approach, encouraging classification models to produce similar features for inputs within the same class, despite perturbations.
arXiv Detail & Related papers (2024-10-02T16:05:03Z)
Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks [42.18755809782401]
We propose a novel transfer attack method called PDCL-Attack. We formulate an effective prompt-driven feature guidance by harnessing the semantic representation power of text.
arXiv Detail & Related papers (2024-07-30T08:52:16Z)
A Survey on Transferability of Adversarial Examples across Deep Neural Networks [53.04734042366312]
adversarial examples can manipulate machine learning models into making erroneous predictions. The transferability of adversarial examples enables black-box attacks which circumvent the need for detailed knowledge of the target model. This survey explores the landscape of the adversarial transferability of adversarial examples.
arXiv Detail & Related papers (2023-10-26T17:45:26Z)
CT-GAT: Cross-Task Generative Adversarial Attack based on Transferability [24.272384832200522]
We propose a novel approach that directly constructs adversarial examples by extracting transferable features across various tasks. Specifically, we train a sequence-to-sequence generative model named CT-GAT using adversarial sample data collected from multiple tasks to acquire universal adversarial features. Results demonstrate that our method achieves superior attack performance with small cost.
arXiv Detail & Related papers (2023-10-22T11:00:04Z)
Why Does Little Robustness Help? Understanding and Improving Adversarial Transferability from Surrogate Training [24.376314203167016]
Adversarial examples (AEs) for DNNs have been shown to be transferable. In this paper, we take a further step towards understanding adversarial transferability.
arXiv Detail & Related papers (2023-07-15T19:20:49Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer Learning [60.784641458579124]
We show that fine-tuning effectively enhances model robustness under white-box FGSM attacks. We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model. To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model.
arXiv Detail & Related papers (2020-08-25T15:04:32Z)
TREND: Transferability based Robust ENsemble Design [6.663641564969944]
We study the effect of network architecture, input, weight and activation quantization on transferability of adversarial samples. We show that transferability is significantly hampered by input quantization between source and target. We propose a new state-of-the-art ensemble attack to combat this.
arXiv Detail & Related papers (2020-08-04T13:38:14Z)
Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model. We propose to exploit additional information from the feature space to craft stronger adversaries. Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z)
Transferable Perturbations of Deep Feature Distributions [102.94094966908916]
This work presents a new adversarial attack based on the modeling and exploitation of class-wise and layer-wise deep feature distributions. We achieve state-of-the-art targeted blackbox transfer-based attack results for undefended ImageNet models.
arXiv Detail & Related papers (2020-04-27T00:32:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.