Arabic Synonym BERT-based Adversarial Examples for Text Classification
- URL: http://arxiv.org/abs/2402.03477v1
- Date: Mon, 5 Feb 2024 19:39:07 GMT
- Title: Arabic Synonym BERT-based Adversarial Examples for Text Classification
- Authors: Norah Alshahrani, Saied Alshahrani, Esma Wali, Jeanna Matthews
- Abstract summary: This paper introduces the first word-level study of adversarial attacks in Arabic.
We assess the robustness of the state-of-the-art text classification models to adversarial attacks in Arabic.
We study the transferability of these newly produced Arabic adversarial examples to various models and investigate the effectiveness of defense mechanisms.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text classification systems have been proven vulnerable to adversarial text
examples, modified versions of the original text examples that are often
unnoticed by human eyes, yet can force text classification models to alter
their classification. Often, research works quantifying the impact of
adversarial text attacks have been applied only to models trained in English.
In this paper, we introduce the first word-level study of adversarial attacks
in Arabic. Specifically, we use a synonym (word-level) attack using a Masked
Language Modeling (MLM) task with a BERT model in a black-box setting to assess
the robustness of the state-of-the-art text classification models to
adversarial attacks in Arabic. To evaluate the grammatical and semantic
similarities of the newly produced adversarial examples using our synonym
BERT-based attack, we invite four human evaluators to assess and compare the
produced adversarial examples with their original examples. We also study the
transferability of these newly produced Arabic adversarial examples to various
models and investigate the effectiveness of defense mechanisms against these
adversarial examples on the BERT models. We find that fine-tuned BERT models
were more susceptible to our synonym attacks than the other Deep Neural
Networks (DNN) models like WordCNN and WordLSTM we trained. We also find that
fine-tuned BERT models were more susceptible to transferred attacks. We,
lastly, find that fine-tuned BERT models successfully regain at least 2% in
accuracy after applying adversarial training as an initial defense mechanism.
Related papers
- Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods [0.0]
A text adversarial attack involves the deliberate manipulation of input text to mislead the predictions of the model.
BERT, BERT-on-BERT attack, and Fraud Bargain's Attack (FBA) are explored in this paper.
PWWS emerges as the most potent adversary, consistently outperforming other methods across multiple evaluation scenarios.
arXiv Detail & Related papers (2024-04-08T02:55:01Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Adversarial Training for Improving Model Robustness? Look at Both
Prediction and Interpretation [21.594361495948316]
We propose a novel feature-level adversarial training method named FLAT.
FLAT incorporates variational word masks in neural networks to learn global word importance.
Experiments show the effectiveness of FLAT in improving the robustness with respect to both predictions and interpretations.
arXiv Detail & Related papers (2022-03-23T20:04:14Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - Experiments with adversarial attacks on text genres [0.0]
Neural models based on pre-trained transformers, such as BERT or XLM-RoBERTa, demonstrate SOTA results in many NLP tasks.
We show that embedding-based algorithms which can replace some of the most significant'' words with words similar to them, have the ability to influence model predictions in a significant proportion of cases.
arXiv Detail & Related papers (2021-07-05T19:37:59Z) - On the Transferability of Adversarial Attacksagainst Neural Text
Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models.
We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models.
We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z) - BAE: BERT-based Adversarial Examples for Text Classification [9.188318506016898]
We present BAE, a black box attack for generating adversarial examples using contextual perturbations from a BERT masked language model.
We show that BAE performs a stronger attack, in addition to generating adversarial examples with improved grammaticality and semantic coherence as compared to prior work.
arXiv Detail & Related papers (2020-04-04T16:25:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.