Related papers: Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations

Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations

URL: http://arxiv.org/abs/2009.09192v1
Date: Sat, 19 Sep 2020 09:12:24 GMT
Title: Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations
Authors: Yuan Zang, Bairu Hou, Fanchao Qi, Zhiyuan Liu, Xiaojun Meng, Maosong Sun
Abstract summary: Adversarial attacking aims to fool deep neural networks with adversarial examples. We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
Score: 81.82518920087175
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adversarial attacking aims to fool deep neural networks with adversarial examples. In the field of natural language processing, various textual adversarial attack models have been proposed, varying in the accessibility to the victim model. Among them, the attack models that only require the output of the victim model are more fit for real-world situations of adversarial attacking. However, to achieve high attack performance, these models usually need to query the victim model too many times, which is neither efficient nor viable in practice. To tackle this problem, we propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently. In experiments, we evaluate our model by attacking several state-of-the-art models on the benchmark datasets of multiple tasks including sentiment analysis, text classification and natural language inference. Experimental results demonstrate that our model consistently achieves both better attack performance and higher efficiency than recently proposed baseline methods. We also find our attack model can bring more robustness improvement to the victim model by adversarial training. All the code and data of this paper will be made public.

Related papers

DUMB and DUMBer: Is Adversarial Training Worth It in the Real World? [15.469010487781931]
Adversarial examples are small and often imperceptible perturbations crafted to fool machine learning models.<n>Evasion attacks, a form of adversarial attack where input is modified at test time to cause misclassification, are particularly insidious due to their transferability.<n>We introduce DUMBer, an attack framework built on the foundation of the DUMB methodology to evaluate the resilience of adversarially trained models.
arXiv Detail & Related papers (2025-06-23T11:16:21Z)
OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable Evasion Attacks [17.584752814352502]
Evasion Attacks (EA) are used to test the robustness of trained neural networks by distorting input data. We introduce a self-supervised, computationally economical method for generating adversarial examples. Our experiments consistently demonstrate the method is effective across various models, unseen data categories, and even defended models.
arXiv Detail & Related papers (2023-10-05T17:34:47Z)
Learning to Learn Transferable Attack [77.67399621530052]
Transfer adversarial attack is a non-trivial black-box adversarial attack that aims to craft adversarial perturbations on the surrogate model and then apply such perturbations to the victim model. We propose a Learning to Learn Transferable Attack (LLTA) method, which makes the adversarial perturbations more generalized via learning from both data and model augmentation. Empirical results on the widely-used dataset demonstrate the effectiveness of our attack method with a 12.85% higher success rate of transfer attack compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-10T07:24:21Z)
Adversarial Transfer Attacks With Unknown Data and Class Overlap [19.901933940805684]
Current transfer attack research has an unrealistic advantage for the attacker. We present the first study of transferring adversarial attacks focusing on the data available to attacker and victim under imperfect settings. This threat model is relevant to applications in medicine, malware, and others.
arXiv Detail & Related papers (2021-09-23T03:41:34Z)
Multi-granularity Textual Adversarial Attack with Behavior Cloning [4.727534308759158]
We propose MAYA, a Multi-grAnularitY Attack model to generate high-quality adversarial samples with fewer queries to victim models. We conduct comprehensive experiments to evaluate our attack models by attacking BiLSTM, BERT and RoBERTa in two different black-box attack settings and three benchmark datasets.
arXiv Detail & Related papers (2021-09-09T15:46:45Z)
Delving into Data: Effectively Substitute Training for Black-box Attack [84.85798059317963]
We propose a novel perspective substitute training that focuses on designing the distribution of data used in the knowledge stealing process. The combination of these two modules can further boost the consistency of the substitute model and target model, which greatly improves the effectiveness of adversarial attack.
arXiv Detail & Related papers (2021-04-26T07:26:29Z)
Explain2Attack: Text Adversarial Attacks via Cross-Domain Interpretability [18.92690624514601]
Research has shown that down-stream models can be easily fooled with adversarial inputs that look like the training data, but slightly perturbed, in a way imperceptible to humans. In this paper, we propose Explain2Attack, a black-box adversarial attack on text classification task. We show that our framework either achieves or out-performs attack rates of the state-of-the-art models, yet with lower queries cost and higher efficiency.
arXiv Detail & Related papers (2020-10-14T04:56:41Z)
Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer Learning [60.784641458579124]
We show that fine-tuning effectively enhances model robustness under white-box FGSM attacks. We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model. To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model.
arXiv Detail & Related papers (2020-08-25T15:04:32Z)
Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models. Current substitute attacks need pre-trained models to generate adversarial examples. In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z)
DaST: Data-free Substitute Training for Adversarial Attacks [55.76371274622313]
We propose a data-free substitute training method (DaST) to obtain substitute models for adversarial black-box attacks. To achieve this, DaST utilizes specially designed generative adversarial networks (GANs) to train the substitute models. Experiments demonstrate the substitute models can achieve competitive performance compared with the baseline models.
arXiv Detail & Related papers (2020-03-28T04:28:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.