Related papers: Semantic-Preserving Adversarial Text Attacks

Semantic-Preserving Adversarial Text Attacks

URL: http://arxiv.org/abs/2108.10015v1
Date: Mon, 23 Aug 2021 09:05:18 GMT
Title: Semantic-Preserving Adversarial Text Attacks
Authors: Xinghao Yang, Weifeng Liu, James Bailey, Tianqing Zhu, Dacheng Tao, Wei Liu
Abstract summary: We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models. Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
Score: 85.32186121859321
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks (DNNs) are known to be vulnerable to adversarial images, while their robustness in text classification is rarely studied. Several lines of text attack methods have been proposed in the literature, including character-level, word-level, and sentence-level attacks. However, it is still a challenge to minimize the number of word changes necessary to induce misclassification, while simultaneously ensuring lexical correctness, syntactic soundness, and semantic similarity. In this paper, we propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models. Our method has four major merits. Firstly, we propose to attack text documents not only at the unigram word level but also at the bigram level which better keeps semantics and avoids producing meaningless outputs. Secondly, we propose a hybrid method to replace the input words with options among both their synonyms candidates and sememe candidates, which greatly enriches the potential substitutions compared to only using synonyms. Thirdly, we design an optimization algorithm, i.e., Semantic Preservation Optimization (SPO), to determine the priority of word replacements, aiming to reduce the modification cost. Finally, we further improve the SPO with a semantic Filter (named SPOF) to find the adversarial example with the highest semantic similarity. We evaluate the effectiveness of our BU-SPO and BU-SPOF on IMDB, AG's News, and Yahoo! Answers text datasets by attacking four popular DNNs models. Results show that our methods achieve the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.

Related papers

HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack on Text [40.58680960214544]
Black-box hard-label adversarial attack on text is a practical and challenging task. We propose a framework to generate high quality adversarial examples under the black-box hard-label attack scenarios, named HQA-Attack.
arXiv Detail & Related papers (2024-02-02T10:06:43Z)
Single Word Change is All You Need: Designing Attacks and Defenses for Text Classifiers [12.167426402230229]
A significant portion of adversarial examples generated by existing methods change only one word. This single-word perturbation vulnerability represents a significant weakness in classifiers. We present the SP-Attack, designed to exploit the single-word perturbation vulnerability, achieving a higher attack success rate. We also propose SP-Defense, which aims to improve rho by applying data augmentation in learning.
arXiv Detail & Related papers (2024-01-30T17:30:44Z)
SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation [72.10931780019297]
Existing watermarking algorithms are vulnerable to paraphrase attacks because of their token-level design. We propose SemStamp, a robust sentence-level semantic watermarking algorithm based on locality-sensitive hashing (LSH) Experimental results show that our novel semantic watermark algorithm is not only more robust than the previous state-of-the-art method on both common and bigram paraphrase attacks, but also is better at preserving the quality of generation.
arXiv Detail & Related papers (2023-10-06T03:33:42Z)
Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks. Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z)
Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label. Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm. Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z)
Adversarial Semantic Collisions [129.55896108684433]
We study semantic collisions: texts that are semantically unrelated but judged as similar by NLP models. We develop gradient-based approaches for generating semantic collisions. We show how to generate semantic collisions that evade perplexity-based filtering.
arXiv Detail & Related papers (2020-11-09T20:42:01Z)
Assessing Robustness of Text Classification through Maximal Safe Radius Computation [21.05890715709053]
We aim to provide guarantees that the model prediction does not change if a word is replaced with a plausible alternative, such as a synonym. As a measure of robustness, we adopt the notion of the maximal safe radius for a given input text, which is the minimum distance in the embedding space to the decision boundary. For the upper bound computation, we employ Monte Carlo Tree Search in conjunction with syntactic filtering to analyse the effect of single and multiple word substitutions.
arXiv Detail & Related papers (2020-10-01T09:46:32Z)
Reevaluating Adversarial Examples in Natural Language [20.14869834829091]
We analyze the outputs of two state-of-the-art synonym substitution attacks. We find that their perturbations often do not preserve semantics, and 38% introduce grammatical errors. With constraints adjusted to better preserve semantics and grammaticality, the attack success rate drops by over 70 percentage points.
arXiv Detail & Related papers (2020-04-25T03:09:48Z)
BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images) We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples. Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.