SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation
- URL: http://arxiv.org/abs/2502.07101v1
- Date: Mon, 10 Feb 2025 22:46:57 GMT
- Title: SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation
- Authors: Saurabh Kumar Pandey, Sachin Vashistha, Debrup Das, Somak Aditya, Monojit Choudhury,
- Abstract summary: We introduce a Sensitivity-based Multi-Armed Bandit framework (SMAB) for calculating word-level local (sentence-level) and global (aggregated) sensitivities.
We show that our algorithm indeed captures intuitively high and low-sensitive words.
We also show that sensitivity can serve as a proxy for accuracy in the absence of gold data.
- Score: 10.111657705438654
- License:
- Abstract: To understand the complexity of sequence classification tasks, Hahn et al. (2021) proposed sensitivity as the number of disjoint subsets of the input sequence that can each be individually changed to change the output. Though effective, calculating sensitivity at scale using this framework is costly because of exponential time complexity. Therefore, we introduce a Sensitivity-based Multi-Armed Bandit framework (SMAB), which provides a scalable approach for calculating word-level local (sentence-level) and global (aggregated) sensitivities concerning an underlying text classifier for any dataset. We establish the effectiveness of our approach through various applications. We perform a case study on CHECKLIST generated sentiment analysis dataset where we show that our algorithm indeed captures intuitively high and low-sensitive words. Through experiments on multiple tasks and languages, we show that sensitivity can serve as a proxy for accuracy in the absence of gold data. Lastly, we show that guiding perturbation prompts using sensitivity values in adversarial example generation improves attack success rate by 15.58%, whereas using sensitivity as an additional reward in adversarial paraphrase generation gives a 12.00% improvement over SOTA approaches. Warning: Contains potentially offensive content.
Related papers
- Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection [60.09665704993751]
We introduce FairOPT, an algorithm for group-specific threshold optimization in AI-generated content classifiers.
Our approach partitions data into subgroups based on attributes (e.g., text length and writing style) and learns decision thresholds for each group.
Our framework paves the way for more robust and fair classification criteria in AI-generated output detection.
arXiv Detail & Related papers (2025-02-06T21:58:48Z) - Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better [21.901523394933076]
We propose a novel fine-tuned detector, Pecola, bridging metric-based and fine-tuned methods by contrastive learning on selective perturbation.
Experiments show that Pecola outperforms the state-of-the-art (SOTA) by 1.20% in accuracy on average on four public datasets.
arXiv Detail & Related papers (2024-02-01T01:23:07Z) - How are Prompts Different in Terms of Sensitivity? [50.67313477651395]
We present a comprehensive prompt analysis based on the sensitivity of a function.
We use gradient-based saliency scores to empirically demonstrate how different prompts affect the relevance of input tokens to the output.
We introduce sensitivity-aware decoding which incorporates sensitivity estimation as a penalty term in the standard greedy decoding.
arXiv Detail & Related papers (2023-11-13T10:52:01Z) - Improving the Robustness of Summarization Systems with Dual Augmentation [68.53139002203118]
A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input.
We first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise.
We propose a SummAttacker, which is an efficient approach to generating adversarial samples based on language models.
arXiv Detail & Related papers (2023-06-01T19:04:17Z) - RaSa: Relation and Sensitivity Aware Representation Learning for
Text-based Person Search [51.09723403468361]
We propose a Relation and Sensitivity aware representation learning method (RaSa)
RaSa includes two novel tasks: Relation-Aware learning (RA) and Sensitivity-Aware learning (SA)
Experiments demonstrate that RaSa outperforms existing state-of-the-art methods by 6.94%, 4.45% and 15.35% in terms of Rank@1 on datasets.
arXiv Detail & Related papers (2023-05-23T03:53:57Z) - Language Model Classifier Aligns Better with Physician Word Sensitivity
than XGBoost on Readmission Prediction [86.15787587540132]
We introduce sensitivity score, a metric that scrutinizes models' behaviors at the vocabulary level.
Our experiments compare the decision-making logic of clinicians and classifiers based on rank correlations of sensitivity scores.
arXiv Detail & Related papers (2022-11-13T23:59:11Z) - On the Relation between Sensitivity and Accuracy in In-context Learning [41.27837171531926]
In-context learning (ICL) suffers from oversensitivity to the prompt, making it unreliable in real-world scenarios.
We study the sensitivity of ICL with respect to multiple perturbation types.
We propose textscSenSel, a few-shot selective prediction method that abstains from sensitive predictions.
arXiv Detail & Related papers (2022-09-16T00:52:34Z) - Learning Disentangled Textual Representations via Statistical Measures
of Similarity [35.74568888409149]
We introduce a family of regularizers for learning disentangled representations that do not require training.
Our novel regularizers do not require additional training, are faster and do not involve additional tuning.
arXiv Detail & Related papers (2022-05-07T08:06:22Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - Sensitivity as a Complexity Measure for Sequence Classification Tasks [24.246784593571626]
We argue that standard sequence classification methods are biased towards learning low-sensitivity functions, so that tasks requiring high sensitivity are more difficult.
We estimate sensitivity on 15 NLP tasks, finding that sensitivity is higher on challenging tasks collected in GLUE than on simple text classification tasks.
arXiv Detail & Related papers (2021-04-21T03:56:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.