Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods
- URL: http://arxiv.org/abs/2404.05159v1
- Date: Mon, 8 Apr 2024 02:55:01 GMT
- Title: Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods
- Authors: Roopkatha Dey, Aivy Debnath, Sayak Kumar Dutta, Kaustav Ghosh, Arijit Mitra, Arghya Roy Chowdhury, Jaydip Sen,
- Abstract summary: A text adversarial attack involves the deliberate manipulation of input text to mislead the predictions of the model.
BERT, BERT-on-BERT attack, and Fraud Bargain's Attack (FBA) are explored in this paper.
PWWS emerges as the most potent adversary, consistently outperforming other methods across multiple evaluation scenarios.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In various real-world applications such as machine translation, sentiment analysis, and question answering, a pivotal role is played by NLP models, facilitating efficient communication and decision-making processes in domains ranging from healthcare to finance. However, a significant challenge is posed to the robustness of these natural language processing models by text adversarial attacks. These attacks involve the deliberate manipulation of input text to mislead the predictions of the model while maintaining human interpretability. Despite the remarkable performance achieved by state-of-the-art models like BERT in various natural language processing tasks, they are found to remain vulnerable to adversarial perturbations in the input text. In addressing the vulnerability of text classifiers to adversarial attacks, three distinct attack mechanisms are explored in this paper using the victim model BERT: BERT-on-BERT attack, PWWS attack, and Fraud Bargain's Attack (FBA). Leveraging the IMDB, AG News, and SST2 datasets, a thorough comparative analysis is conducted to assess the effectiveness of these attacks on the BERT classifier model. It is revealed by the analysis that PWWS emerges as the most potent adversary, consistently outperforming other methods across multiple evaluation scenarios, thereby emphasizing its efficacy in generating adversarial examples for text classification. Through comprehensive experimentation, the performance of these attacks is assessed and the findings indicate that the PWWS attack outperforms others, demonstrating lower runtime, higher accuracy, and favorable semantic similarity scores. The key insight of this paper lies in the assessment of the relative performances of three prevalent state-of-the-art attack mechanisms.
Related papers
- MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - Fooling the Textual Fooler via Randomizing Latent Representations [13.77424820701913]
adversarial word-level perturbations are well-studied and effective attack strategies.
We propose a lightweight and attack-agnostic defense whose main goal is to perplex the process of generating an adversarial example.
We empirically demonstrate near state-of-the-art robustness of AdvFooler against representative adversarial word-level attacks.
arXiv Detail & Related papers (2023-10-02T06:57:25Z) - How do humans perceive adversarial text? A reality check on the validity
and naturalness of word-based adversarial attacks [4.297786261992324]
adversarial attacks are malicious algorithms that imperceptibly modify input text to force models into making incorrect predictions.
We surveyed 378 human participants about the perceptibility of text adversarial examples produced by state-of-the-art methods.
Our results underline that existing text attacks are impractical in real-world scenarios where humans are involved.
arXiv Detail & Related papers (2023-05-24T21:52:13Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - Evaluating the Robustness of Semantic Segmentation for Autonomous
Driving against Real-World Adversarial Patch Attacks [62.87459235819762]
In a real-world scenario like autonomous driving, more attention should be devoted to real-world adversarial examples (RWAEs)
This paper presents an in-depth evaluation of the robustness of popular SS models by testing the effects of both digital and real-world adversarial patches.
arXiv Detail & Related papers (2021-08-13T11:49:09Z) - Evaluating Deception Detection Model Robustness To Linguistic Variation [10.131671217810581]
We propose an analysis of model robustness against linguistic variation in the setting of deceptive news detection.
We consider two prediction tasks and compare three state-of-the-art embeddings to highlight consistent trends in model performance.
We find that character or mixed ensemble models are the most effective defenses and that character perturbation-based attack tactics are more successful.
arXiv Detail & Related papers (2021-04-23T17:25:38Z) - Adv-BERT: BERT is not robust on misspellings! Generating nature
adversarial samples on BERT [95.88293021131035]
It is unclear, however, how the models will perform in realistic scenarios where textitnatural rather than malicious adversarial instances often exist.
This work systematically explores the robustness of BERT, the state-of-the-art Transformer-style model in NLP, in dealing with noisy data.
arXiv Detail & Related papers (2020-02-27T22:07:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.