Improved and Efficient Text Adversarial Attacks using Target Information
- URL: http://arxiv.org/abs/2104.13484v1
- Date: Tue, 27 Apr 2021 21:25:55 GMT
- Title: Improved and Efficient Text Adversarial Attacks using Target Information
- Authors: Mahmoud Hossam, Trung Le, He Zhao, Viet Huynh, Dinh Phung
- Abstract summary: A growing interest in studying adversarial examples on natural language models in the black-box setting.
New approach was introduced that addresses this problem through interpretable learning to learn the word ranking instead of previous expensive search.
Main advantage of using this approach is that it achieves comparable attack rates to the state-of-the-art methods, yet faster and with fewer queries.
- Score: 34.50272230153329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been recently a growing interest in studying adversarial examples
on natural language models in the black-box setting. These methods attack
natural language classifiers by perturbing certain important words until the
classifier label is changed. In order to find these important words, these
methods rank all words by importance by querying the target model word by word
for each input sentence, resulting in high query inefficiency. A new
interesting approach was introduced that addresses this problem through
interpretable learning to learn the word ranking instead of previous expensive
search. The main advantage of using this approach is that it achieves
comparable attack rates to the state-of-the-art methods, yet faster and with
fewer queries, where fewer queries are desirable to avoid suspicion towards the
attacking agent. Nonetheless, this approach sacrificed the useful information
that could be leveraged from the target classifier for that sake of query
efficiency. In this paper we study the effect of leveraging the target model
outputs and data on both attack rates and average number of queries, and we
show that both can be improved, with a limited overhead of additional queries.
Related papers
- IDEAL: Influence-Driven Selective Annotations Empower In-Context
Learners in Large Language Models [66.32043210237768]
This paper introduces an influence-driven selective annotation method.
It aims to minimize annotation costs while improving the quality of in-context examples.
Experiments confirm the superiority of the proposed method on various benchmarks.
arXiv Detail & Related papers (2023-10-16T22:53:54Z) - Making Retrieval-Augmented Language Models Robust to Irrelevant Context [55.564789967211844]
An important desideratum of RALMs, is that retrieved information helps model performance when it is relevant.
Recent work has shown that retrieval augmentation can sometimes have a negative effect on performance.
arXiv Detail & Related papers (2023-10-02T18:52:35Z) - Mitigating Word Bias in Zero-shot Prompt-based Classifiers [55.60306377044225]
We show that matching class priors correlates strongly with the oracle upper bound performance.
We also demonstrate large consistent performance gains for prompt settings over a range of NLP tasks.
arXiv Detail & Related papers (2023-09-10T10:57:41Z) - LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial
Attack [3.410883081705873]
We propose a novel hard-label attack algorithm named LimeAttack.
We show that LimeAttack achieves the better attacking performance compared with existing hard-label attack.
adversarial examples crafted by LimeAttack are highly transferable and effectively improve model robustness in adversarial training.
arXiv Detail & Related papers (2023-08-01T06:30:37Z) - Automatic Counterfactual Augmentation for Robust Text Classification
Based on Word-Group Search [12.894936637198471]
In general, a keyword is regarded as a shortcut if it creates a superficial association with the label, resulting in a false prediction.
We propose a new Word-Group mining approach, which captures the causal effect of any keyword combination and orders the combinations that most affect the prediction.
Our approach bases on effective post-hoc analysis and beam search, which ensures the mining effect and reduces the complexity.
arXiv Detail & Related papers (2023-07-01T02:26:34Z) - Extending an Event-type Ontology: Adding Verbs and Classes Using
Fine-tuned LLMs Suggestions [0.0]
We have investigated the use of advanced machine learning methods for pre-annotating data for a lexical extension task.
We have examined the correlation of the automatic scores with the human annotation.
While the correlation turned out to be strong, its influence on the annotation proper is modest due to its near linearity.
arXiv Detail & Related papers (2023-06-03T14:57:47Z) - Query Efficient Cross-Dataset Transferable Black-Box Attack on Action
Recognition [99.29804193431823]
Black-box adversarial attacks present a realistic threat to action recognition systems.
We propose a new attack on action recognition that addresses these shortcomings by generating perturbations.
Our method achieves 8% and higher 12% deception rates compared to state-of-the-art query-based and transfer-based attacks.
arXiv Detail & Related papers (2022-11-23T17:47:49Z) - A Differentiable Language Model Adversarial Attack on Text Classifiers [10.658675415759697]
We propose a new black-box sentence-level attack for natural language processing.
Our method fine-tunes a pre-trained language model to generate adversarial examples.
We show that the proposed attack outperforms competitors on a diverse set of NLP problems for both computed metrics and human evaluation.
arXiv Detail & Related papers (2021-07-23T14:43:13Z) - The Application of Active Query K-Means in Text Classification [0.0]
Active learning is a state-of-art machine learning approach to deal with an abundance of unlabeled data.
Traditional unsupervised k-means clustering is first modified into a semi-supervised version in this research.
A novel attempt is applied to further extend the algorithm into active learning scenario with Penalized Min-Max-selection.
After tested on a Chinese news dataset, it shows a consistent increase in accuracy while lowering the cost in training.
arXiv Detail & Related papers (2021-07-16T03:06:35Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.