Attacking Text Classifiers via Sentence Rewriting Sampler
- URL: http://arxiv.org/abs/2104.08453v1
- Date: Sat, 17 Apr 2021 05:21:35 GMT
- Title: Attacking Text Classifiers via Sentence Rewriting Sampler
- Authors: Lei Xu, Kalyan Veeramachaneni
- Abstract summary: General sentence rewriting sampler (SRS) framework can conditionally generate meaningful sentences.
Our method can effectively rewrite the original sentence in multiple ways while maintaining high semantic similarity and good sentence quality.
Our method achieves a better attack success rate on 4 out of 7 datasets, as well as significantly better sentence quality on all 7 datasets.
- Score: 12.25764838264699
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most adversarial attack methods on text classification are designed to change
the classifier's prediction by modifying few words or characters. Few try to
attack classifiers by rewriting a whole sentence, due to the difficulties
inherent in sentence-level rephrasing and the problem of maintaining high
semantic similarity and sentence quality.
To tackle this problem, we design a general sentence rewriting sampler (SRS)
framework, which can conditionally generate meaningful sentences. Then we
customize SRS to attack text classification models. Our method can effectively
rewrite the original sentence in multiple ways while maintaining high semantic
similarity and good sentence quality. Experimental results show that many of
these rewritten sentences are misclassified by the classifier. Our method
achieves a better attack success rate on 4 out of 7 datasets, as well as
significantly better sentence quality on all 7 datasets.
Related papers
- On Adversarial Examples for Text Classification by Perturbing Latent Representations [0.0]
We show that deep learning is vulnerable to adversarial examples in text classification.
This weakness indicates that deep learning is not very robust.
We create a framework that measures the robustness of a text classifier by using the gradients of the classifier.
arXiv Detail & Related papers (2024-05-06T18:45:18Z) - Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation [52.72682366640554]
Authorship Verification (AV) is a text classification task concerned with inferring whether a candidate text has been written by one specific author or by someone else.
It has been shown that many AV systems are vulnerable to adversarial attacks, where a malicious author actively tries to fool the classifier by either concealing their writing style, or by imitating the style of another author.
arXiv Detail & Related papers (2024-03-17T16:36:26Z) - Single Word Change is All You Need: Designing Attacks and Defenses for
Text Classifiers [12.167426402230229]
A significant portion of adversarial examples generated by existing methods change only one word.
This single-word perturbation vulnerability represents a significant weakness in classifiers.
We present the SP-Attack, designed to exploit the single-word perturbation vulnerability, achieving a higher attack success rate.
We also propose SP-Defense, which aims to improve rho by applying data augmentation in learning.
arXiv Detail & Related papers (2024-01-30T17:30:44Z) - SenTest: Evaluating Robustness of Sentence Encoders [0.4194295877935868]
This work focuses on evaluating the robustness of the sentence encoders.
We employ several adversarial attacks to evaluate its robustness.
The results of the experiments strongly undermine the robustness of sentence encoders.
arXiv Detail & Related papers (2023-11-29T15:21:35Z) - RankCSE: Unsupervised Sentence Representations Learning via Learning to
Rank [54.854714257687334]
We propose a novel approach, RankCSE, for unsupervised sentence representation learning.
It incorporates ranking consistency and ranking distillation with contrastive learning into a unified framework.
An extensive set of experiments are conducted on both semantic textual similarity (STS) and transfer (TR) tasks.
arXiv Detail & Related papers (2023-05-26T08:27:07Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Clustering and Network Analysis for the Embedding Spaces of Sentences
and Sub-Sentences [69.3939291118954]
This paper reports research on a set of comprehensive clustering and network analyses targeting sentence and sub-sentence embedding spaces.
Results show that one method generates the most clusterable embeddings.
In general, the embeddings of span sub-sentences have better clustering properties than the original sentences.
arXiv Detail & Related papers (2021-10-02T00:47:35Z) - Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.
Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z) - Narrative Incoherence Detection [76.43894977558811]
We propose the task of narrative incoherence detection as a new arena for inter-sentential semantic understanding.
Given a multi-sentence narrative, decide whether there exist any semantic discrepancies in the narrative flow.
arXiv Detail & Related papers (2020-12-21T07:18:08Z) - Rewriting Meaningful Sentences via Conditional BERT Sampling and an
application on fooling text classifiers [11.49508308643065]
adversarial attack methods that are designed to deceive a text classifier change the text classifier's prediction by modifying a few words or characters.
Few try to attack classifiers by rewriting a whole sentence, due to the difficulties inherent in sentence-level rephrasing as well as the problem of setting the criteria for legitimate rewriting.
In this paper, we explore the problem of creating adversarial examples with sentence-level rewriting.
We propose a new criteria for modification, called a sentence-level threaten model. This criteria allows for both word- and sentence-level changes, and can be adjusted independently in two dimensions: semantic similarity and
arXiv Detail & Related papers (2020-10-22T17:03:13Z) - Elephant in the Room: An Evaluation Framework for Assessing Adversarial
Examples in NLP [24.661335236627053]
An adversarial example is an input transformed by small perturbations that machine learning models consistently misclassify.
We propose an evaluation framework consisting of automatic evaluation metrics and human evaluation guidelines.
arXiv Detail & Related papers (2020-01-22T00:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.