Related papers: Arabic Synonym BERT-based Adversarial Examples for Text Classification

Arabic Synonym BERT-based Adversarial Examples for Text Classification

URL: http://arxiv.org/abs/2402.03477v1
Date: Mon, 5 Feb 2024 19:39:07 GMT
Title: Arabic Synonym BERT-based Adversarial Examples for Text Classification
Authors: Norah Alshahrani, Saied Alshahrani, Esma Wali, Jeanna Matthews
Abstract summary: This paper introduces the first word-level study of adversarial attacks in Arabic. We assess the robustness of the state-of-the-art text classification models to adversarial attacks in Arabic. We study the transferability of these newly produced Arabic adversarial examples to various models and investigate the effectiveness of defense mechanisms.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text classification systems have been proven vulnerable to adversarial text examples, modified versions of the original text examples that are often unnoticed by human eyes, yet can force text classification models to alter their classification. Often, research works quantifying the impact of adversarial text attacks have been applied only to models trained in English. In this paper, we introduce the first word-level study of adversarial attacks in Arabic. Specifically, we use a synonym (word-level) attack using a Masked Language Modeling (MLM) task with a BERT model in a black-box setting to assess the robustness of the state-of-the-art text classification models to adversarial attacks in Arabic. To evaluate the grammatical and semantic similarities of the newly produced adversarial examples using our synonym BERT-based attack, we invite four human evaluators to assess and compare the produced adversarial examples with their original examples. We also study the transferability of these newly produced Arabic adversarial examples to various models and investigate the effectiveness of defense mechanisms against these adversarial examples on the BERT models. We find that fine-tuned BERT models were more susceptible to our synonym attacks than the other Deep Neural Networks (DNN) models like WordCNN and WordLSTM we trained. We also find that fine-tuned BERT models were more susceptible to transferred attacks. We, lastly, find that fine-tuned BERT models successfully regain at least 2% in accuracy after applying adversarial training as an initial defense mechanism.

Related papers

Injecting Bias into Text Classification Models using Backdoor Attacks [0.0]
We propose to utilize backdoor attacks for a new purpose: bias injection. We develop a backdoor attack in which a subset of the training dataset is poisoned to associate strong male actors with negative sentiment. Our results show that the reduction in backdoored models' benign classification accuracy is limited.
arXiv Detail & Related papers (2024-12-25T19:32:02Z)
Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods [0.0]
A text adversarial attack involves the deliberate manipulation of input text to mislead the predictions of the model. BERT, BERT-on-BERT attack, and Fraud Bargain's Attack (FBA) are explored in this paper. PWWS emerges as the most potent adversary, consistently outperforming other methods across multiple evaluation scenarios.
arXiv Detail & Related papers (2024-04-08T02:55:01Z)
SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios. We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z)
Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks. Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z)
In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks. Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks. We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z)
Adversarial Training for Improving Model Robustness? Look at Both Prediction and Interpretation [21.594361495948316]
We propose a novel feature-level adversarial training method named FLAT. FLAT incorporates variational word masks in neural networks to learn global word importance. Experiments show the effectiveness of FLAT in improving the robustness with respect to both predictions and interpretations.
arXiv Detail & Related papers (2022-03-23T20:04:14Z)
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks. We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations. All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)
Experiments with adversarial attacks on text genres [0.0]
Neural models based on pre-trained transformers, such as BERT or XLM-RoBERTa, demonstrate SOTA results in many NLP tasks. We show that embedding-based algorithms which can replace some of the most significant'' words with words similar to them, have the ability to influence model predictions in a significant proportion of cases.
arXiv Detail & Related papers (2021-07-05T19:37:59Z)
On the Transferability of Adversarial Attacksagainst Neural Text Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models. We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models. We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z)
BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images) We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples. Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
BAE: BERT-based Adversarial Examples for Text Classification [9.188318506016898]
We present BAE, a black box attack for generating adversarial examples using contextual perturbations from a BERT masked language model. We show that BAE performs a stronger attack, in addition to generating adversarial examples with improved grammaticality and semantic coherence as compared to prior work.
arXiv Detail & Related papers (2020-04-04T16:25:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.