Adversarial Imitation Attack
- URL: http://arxiv.org/abs/2003.12760v2
- Date: Tue, 31 Mar 2020 05:10:40 GMT
- Title: Adversarial Imitation Attack
- Authors: Mingyi Zhou, Jing Wu, Yipeng Liu, Xiaolin Huang, Shuaicheng Liu, Xiang
Zhang, Ce Zhu
- Abstract summary: A practical adversarial attack should require as little as possible knowledge of attacked models.
Current substitute attacks need pre-trained models to generate adversarial examples.
In this study, we propose a novel adversarial imitation attack.
- Score: 63.76805962712481
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models are known to be vulnerable to adversarial examples. A
practical adversarial attack should require as little as possible knowledge of
attacked models. Current substitute attacks need pre-trained models to generate
adversarial examples and their attack success rates heavily rely on the
transferability of adversarial examples. Current score-based and decision-based
attacks require lots of queries for the attacked models. In this study, we
propose a novel adversarial imitation attack. First, it produces a replica of
the attacked model by a two-player game like the generative adversarial
networks (GANs). The objective of the generative model is to generate examples
that lead the imitation model returning different outputs with the attacked
model. The objective of the imitation model is to output the same labels with
the attacked model under the same inputs. Then, the adversarial examples
generated by the imitation model are utilized to fool the attacked model.
Compared with the current substitute attacks, imitation attacks can use less
training data to produce a replica of the attacked model and improve the
transferability of adversarial examples. Experiments demonstrate that our
imitation attack requires less training data than the black-box substitute
attacks, but achieves an attack success rate close to the white-box attack on
unseen data with no query.
Related papers
- Scalable Membership Inference Attacks via Quantile Regression [35.33158339354343]
Membership inference attacks are designed to determine, using black box access to trained models, whether a particular example was used in training or not.
We introduce a new class of attacks based on performing quantile regression on the distribution of confidence scores induced by the model under attack on points that are not used in training.
arXiv Detail & Related papers (2023-07-07T16:07:00Z) - Can Adversarial Examples Be Parsed to Reveal Victim Model Information? [62.814751479749695]
In this work, we ask whether it is possible to infer data-agnostic victim model (VM) information from data-specific adversarial instances.
We collect a dataset of adversarial attacks across 7 attack types generated from 135 victim models.
We show that a simple, supervised model parsing network (MPN) is able to infer VM attributes from unseen adversarial attacks.
arXiv Detail & Related papers (2023-03-13T21:21:49Z) - Adversarial Transfer Attacks With Unknown Data and Class Overlap [19.901933940805684]
Current transfer attack research has an unrealistic advantage for the attacker.
We present the first study of transferring adversarial attacks focusing on the data available to attacker and victim under imperfect settings.
This threat model is relevant to applications in medicine, malware, and others.
arXiv Detail & Related papers (2021-09-23T03:41:34Z) - Manipulating SGD with Data Ordering Attacks [23.639512087220137]
We present a class of training-time attacks that require no changes to the underlying model dataset or architecture.
In particular, an attacker can disrupt the integrity and availability of a model by simply reordering training batches.
Attacks have a long-term impact in that they decrease model performance hundreds of epochs after the attack took place.
arXiv Detail & Related papers (2021-04-19T22:17:27Z) - Explain2Attack: Text Adversarial Attacks via Cross-Domain
Interpretability [18.92690624514601]
Research has shown that down-stream models can be easily fooled with adversarial inputs that look like the training data, but slightly perturbed, in a way imperceptible to humans.
In this paper, we propose Explain2Attack, a black-box adversarial attack on text classification task.
We show that our framework either achieves or out-performs attack rates of the state-of-the-art models, yet with lower queries cost and higher efficiency.
arXiv Detail & Related papers (2020-10-14T04:56:41Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z) - Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer
Learning [60.784641458579124]
We show that fine-tuning effectively enhances model robustness under white-box FGSM attacks.
We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model.
To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model.
arXiv Detail & Related papers (2020-08-25T15:04:32Z) - Adversarial examples are useful too! [47.64219291655723]
I propose a new method to tell whether a model has been subject to a backdoor attack.
The idea is to generate adversarial examples, targeted or untargeted, using conventional attacks such as FGSM.
It is possible to visually locate the perturbed regions and unveil the attack.
arXiv Detail & Related papers (2020-05-13T01:38:56Z) - DaST: Data-free Substitute Training for Adversarial Attacks [55.76371274622313]
We propose a data-free substitute training method (DaST) to obtain substitute models for adversarial black-box attacks.
To achieve this, DaST utilizes specially designed generative adversarial networks (GANs) to train the substitute models.
Experiments demonstrate the substitute models can achieve competitive performance compared with the baseline models.
arXiv Detail & Related papers (2020-03-28T04:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.