Explain2Attack: Text Adversarial Attacks via Cross-Domain
Interpretability
- URL: http://arxiv.org/abs/2010.06812v4
- Date: Sat, 16 Jan 2021 09:08:01 GMT
- Title: Explain2Attack: Text Adversarial Attacks via Cross-Domain
Interpretability
- Authors: Mahmoud Hossam, Trung Le, He Zhao, and Dinh Phung
- Abstract summary: Research has shown that down-stream models can be easily fooled with adversarial inputs that look like the training data, but slightly perturbed, in a way imperceptible to humans.
In this paper, we propose Explain2Attack, a black-box adversarial attack on text classification task.
We show that our framework either achieves or out-performs attack rates of the state-of-the-art models, yet with lower queries cost and higher efficiency.
- Score: 18.92690624514601
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training robust deep learning models for down-stream tasks is a critical
challenge. Research has shown that down-stream models can be easily fooled with
adversarial inputs that look like the training data, but slightly perturbed, in
a way imperceptible to humans. Understanding the behavior of natural language
models under these attacks is crucial to better defend these models against
such attacks. In the black-box attack setting, where no access to model
parameters is available, the attacker can only query the output information
from the targeted model to craft a successful attack. Current black-box
state-of-the-art models are costly in both computational complexity and number
of queries needed to craft successful adversarial examples. For real world
scenarios, the number of queries is critical, where less queries are desired to
avoid suspicion towards an attacking agent. In this paper, we propose
Explain2Attack, a black-box adversarial attack on text classification task.
Instead of searching for important words to be perturbed by querying the target
model, Explain2Attack employs an interpretable substitute model from a similar
domain to learn word importance scores. We show that our framework either
achieves or out-performs attack rates of the state-of-the-art models, yet with
lower queries cost and higher efficiency.
Related papers
- Generalizable Black-Box Adversarial Attack with Meta Learning [54.196613395045595]
In black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful perturbation based on query feedback under a query budget.
We propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability.
The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance.
arXiv Detail & Related papers (2023-01-01T07:24:12Z) - Query Efficient Cross-Dataset Transferable Black-Box Attack on Action
Recognition [99.29804193431823]
Black-box adversarial attacks present a realistic threat to action recognition systems.
We propose a new attack on action recognition that addresses these shortcomings by generating perturbations.
Our method achieves 8% and higher 12% deception rates compared to state-of-the-art query-based and transfer-based attacks.
arXiv Detail & Related papers (2022-11-23T17:47:49Z) - TASA: Deceiving Question Answering Models by Twin Answer Sentences
Attack [93.50174324435321]
We present Twin Answer Sentences Attack (TASA), an adversarial attack method for question answering (QA) models.
TASA produces fluent and grammatical adversarial contexts while maintaining gold answers.
arXiv Detail & Related papers (2022-10-27T07:16:30Z) - Multi-granularity Textual Adversarial Attack with Behavior Cloning [4.727534308759158]
We propose MAYA, a Multi-grAnularitY Attack model to generate high-quality adversarial samples with fewer queries to victim models.
We conduct comprehensive experiments to evaluate our attack models by attacking BiLSTM, BERT and RoBERTa in two different black-box attack settings and three benchmark datasets.
arXiv Detail & Related papers (2021-09-09T15:46:45Z) - Query-free Black-box Adversarial Attacks on Graphs [37.88689315688314]
We propose a query-free black-box adversarial attack on graphs, in which the attacker has no knowledge of the target model and no query access to the model.
We prove that the impact of the flipped links on the target model can be quantified by spectral changes, and thus be approximated using the eigenvalue perturbation theory.
Due to its simplicity and scalability, the proposed model is not only generic in various graph-based models, but can be easily extended when different knowledge levels are accessible as well.
arXiv Detail & Related papers (2020-12-12T08:52:56Z) - Practical No-box Adversarial Attacks against DNNs [31.808770437120536]
We investigate no-box adversarial examples, where the attacker can neither access the model information or the training set nor query the model.
We propose three mechanisms for training with a very small dataset and find that prototypical reconstruction is the most effective.
Our approach significantly diminishes the average prediction accuracy of the system to only 15.40%, which is on par with the attack that transfers adversarial examples from a pre-trained Arcface model.
arXiv Detail & Related papers (2020-12-04T11:10:03Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z) - Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer
Learning [60.784641458579124]
We show that fine-tuning effectively enhances model robustness under white-box FGSM attacks.
We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model.
To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model.
arXiv Detail & Related papers (2020-08-25T15:04:32Z) - Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models.
Current substitute attacks need pre-trained models to generate adversarial examples.
In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.