A Modified Word Saliency-Based Adversarial Attack on Text Classification Models
- URL: http://arxiv.org/abs/2403.11297v1
- Date: Sun, 17 Mar 2024 18:39:14 GMT
- Title: A Modified Word Saliency-Based Adversarial Attack on Text Classification Models
- Authors: Hetvi Waghela, Sneha Rakshit, Jaydip Sen,
- Abstract summary: This paper introduces a novel adversarial attack method targeting text classification models.
The Modified Word Saliency-based Adversarial At-tack (MWSAA) aims to mislead classification models while preserving semantic coherence.
Empirical evaluations conducted on diverse text classification datasets demonstrate the effectiveness of the proposed method.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces a novel adversarial attack method targeting text classification models, termed the Modified Word Saliency-based Adversarial At-tack (MWSAA). The technique builds upon the concept of word saliency to strategically perturb input texts, aiming to mislead classification models while preserving semantic coherence. By refining the traditional adversarial attack approach, MWSAA significantly enhances its efficacy in evading detection by classification systems. The methodology involves first identifying salient words in the input text through a saliency estimation process, which prioritizes words most influential to the model's decision-making process. Subsequently, these salient words are subjected to carefully crafted modifications, guided by semantic similarity metrics to ensure that the altered text remains coherent and retains its original meaning. Empirical evaluations conducted on diverse text classification datasets demonstrate the effectiveness of the proposed method in generating adversarial examples capable of successfully deceiving state-of-the-art classification models. Comparative analyses with existing adversarial attack techniques further indicate the superiority of the proposed approach in terms of both attack success rate and preservation of text coherence.
Related papers
- Saliency Attention and Semantic Similarity-Driven Adversarial Perturbation [0.0]
Saliency Attention and Semantic Similarity driven adversarial Perturbation (SASSP) is designed to improve the effectiveness of contextual perturbations.
Our proposed approach incorporates a three-pronged strategy for word selection and perturbation.
SASSP has yielded a higher attack success rate and lower word perturbation rate.
arXiv Detail & Related papers (2024-06-18T14:07:27Z) - COT: A Generative Approach for Hate Speech Counter-Narratives via Contrastive Optimal Transport [25.73474734479759]
This research paper introduces a novel framework based on contrastive optimal transport.
It effectively addresses the challenges of maintaining target interaction and promoting diversification in generating counter-narratives.
Our proposed model significantly outperforms current methods evaluated by metrics from multiple aspects.
arXiv Detail & Related papers (2024-06-18T06:24:26Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - Adversarial Text Purification: A Large Language Model Approach for
Defense [25.041109219049442]
Adversarial purification is a defense mechanism for safeguarding classifiers against adversarial attacks.
We propose a novel adversarial text purification that harnesses the generative capabilities of Large Language Models.
Our proposed method demonstrates remarkable performance over various classifiers, improving their accuracy under the attack by over 65% on average.
arXiv Detail & Related papers (2024-02-05T02:36:41Z) - Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - Generating Natural Language Attacks in a Hard Label Black Box Setting [3.52359746858894]
We study an important and challenging task of attacking natural language processing models in a hard label black box setting.
We propose a decision-based attack strategy that crafts high quality adversarial examples on text classification and entailment tasks.
arXiv Detail & Related papers (2020-12-29T22:01:38Z) - Adversarial Augmentation Policy Search for Domain and Cross-Lingual
Generalization in Reading Comprehension [96.62963688510035]
Reading comprehension models often overfit to nuances of training datasets and fail at adversarial evaluation.
We present several effective adversaries and automated data augmentation policy search methods with the goal of making reading comprehension models more robust to adversarial evaluation.
arXiv Detail & Related papers (2020-04-13T17:20:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.