Towards Variable-Length Textual Adversarial Attacks
- URL: http://arxiv.org/abs/2104.08139v1
- Date: Fri, 16 Apr 2021 14:37:27 GMT
- Title: Towards Variable-Length Textual Adversarial Attacks
- Authors: Junliang Guo, Zhirui Zhang, Linlin Zhang, Linli Xu, Boxing Chen,
Enhong Chen, Weihua Luo
- Abstract summary: It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
- Score: 68.27995111870712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial attacks have shown the vulnerability of machine learning models,
however, it is non-trivial to conduct textual adversarial attacks on natural
language processing tasks due to the discreteness of data. Most previous
approaches conduct attacks with the atomic \textit{replacement} operation,
which usually leads to fixed-length adversarial examples and therefore limits
the exploration on the decision space. In this paper, we propose
variable-length textual adversarial attacks~(VL-Attack) and integrate three
atomic operations, namely \textit{insertion}, \textit{deletion} and
\textit{replacement}, into a unified framework, by introducing and manipulating
a special \textit{blank} token while attacking. In this way, our approach is
able to more comprehensively find adversarial examples around the decision
boundary and effectively conduct adversarial attacks. Specifically, our method
drops the accuracy of IMDB classification by $96\%$ with only editing $1.3\%$
tokens while attacking a pre-trained BERT model. In addition, fine-tuning the
victim model with generated adversarial samples can improve the robustness of
the model without hurting the performance, especially for length-sensitive
models. On the task of non-autoregressive machine translation, our method can
achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an
improvement of $1.47$ over the baseline model.
Related papers
- Goal-guided Generative Prompt Injection Attack on Large Language Models [6.175969971471705]
Large language models (LLMs) provide a strong foundation for large-scale user-oriented natural language tasks.
A large number of users can easily inject adversarial text or instructions through the user interface.
It is unclear how these strategies relate to the success rate of attacks and thus effectively improve model security.
arXiv Detail & Related papers (2024-04-06T06:17:10Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - ADDMU: Detection of Far-Boundary Adversarial Examples with Data and
Model Uncertainty Estimation [125.52743832477404]
Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks.
We propose a new technique, textbfADDMU, which combines two types of uncertainty estimation for both regular and FB adversarial example detection.
Our new method outperforms previous methods by 3.6 and 6.0 emphAUC points under each scenario.
arXiv Detail & Related papers (2022-10-22T09:11:12Z) - Learning to Ignore Adversarial Attacks [14.24585085013907]
We introduce the use of rationale models that can explicitly learn to ignore attack tokens.
We find that the rationale models can successfully ignore over 90% of attack tokens.
arXiv Detail & Related papers (2022-05-23T18:01:30Z) - A Differentiable Language Model Adversarial Attack on Text Classifiers [10.658675415759697]
We propose a new black-box sentence-level attack for natural language processing.
Our method fine-tunes a pre-trained language model to generate adversarial examples.
We show that the proposed attack outperforms competitors on a diverse set of NLP problems for both computed metrics and human evaluation.
arXiv Detail & Related papers (2021-07-23T14:43:13Z) - BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively
Inspired Orthographic Adversarial Attacks [10.290050493635343]
Adversarial attacks expose important blind spots of deep learning systems.
Character-level attacks typically insert typos into the input stream.
We show that an untrained iterative approach can perform on par with human crowd-workers supervised via 3-shot learning.
arXiv Detail & Related papers (2021-06-02T20:21:03Z) - Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models.
This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.