Related papers: Sample Attackability in Natural Language Adversarial Attacks

Sample Attackability in Natural Language Adversarial Attacks

URL: http://arxiv.org/abs/2306.12043v1
Date: Wed, 21 Jun 2023 06:20:51 GMT
Title: Sample Attackability in Natural Language Adversarial Attacks
Authors: Vyas Raina and Mark Gales
Abstract summary: This work formally extends the definition of sample attackability/robustness for NLP attacks. Experiments on two popular NLP datasets, four state of the art models and four different NLP adversarial attack methods.
Score: 1.4213973379473654
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adversarial attack research in natural language processing (NLP) has made significant progress in designing powerful attack methods and defence approaches. However, few efforts have sought to identify which source samples are the most attackable or robust, i.e. can we determine for an unseen target model, which samples are the most vulnerable to an adversarial attack. This work formally extends the definition of sample attackability/robustness for NLP attacks. Experiments on two popular NLP datasets, four state of the art models and four different NLP adversarial attack methods, demonstrate that sample uncertainty is insufficient for describing characteristics of attackable/robust samples and hence a deep learning based detector can perform much better at identifying the most attackable and robust samples for an unseen target model. Nevertheless, further analysis finds that there is little agreement in which samples are considered the most attackable/robust across different NLP attack methods, explaining a lack of portability of attackability detection methods across attack methods.

Related papers

Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked. We propose an attack-agnostic defense method named Meta Invariance Defense (MID) We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z)
DALA: A Distribution-Aware LoRA-Based Adversarial Attack against Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data. Recent attack methods can achieve a relatively high attack success rate (ASR) We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z)
Confidence-driven Sampling for Backdoor Attacks [49.72680157684523]
Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios. Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples. We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks.
arXiv Detail & Related papers (2023-10-08T18:57:36Z)
Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks [72.03945355787776]
We advocate MDP, a lightweight, pluggable, and effective defense for PLMs as few-shot learners. We show analytically that MDP creates an interesting dilemma for the attacker to choose between attack effectiveness and detection evasiveness.
arXiv Detail & Related papers (2023-09-23T04:41:55Z)
Identifying Adversarially Attackable and Robust Samples [1.4213973379473654]
Adrial attacks insert small, imperceptible perturbations to input samples that cause large, undesired changes to the output of deep learning models. This work introduces the notion of sample attackability, where we aim to identify samples that are most susceptible to adversarial attacks. We propose a deep-learning-based detector to identify the adversarially attackable and robust samples in an unseen dataset for an unseen target model.
arXiv Detail & Related papers (2023-01-30T13:58:14Z)
Improving Adversarial Robustness to Sensitivity and Invariance Attacks with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample. We use metric learning to frame adversarial regularization as an optimal transport problem. Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z)
Towards Generating Adversarial Examples on Mixed-type Data [32.41305735919529]
We propose a novel attack algorithm M-Attack, which can effectively generate adversarial examples in mixed-type data. Based on M-Attack, attackers can attempt to mislead the targeted classification model's prediction, by only slightly perturbing both the numerical and categorical features in the given data samples. Our generated adversarial examples can evade potential detection models, which makes the attack indeed insidious.
arXiv Detail & Related papers (2022-10-17T20:17:21Z)
Rethinking Textual Adversarial Defense for Pre-trained Language Models [79.18455635071817]
A literature review shows that pre-trained language models (PrLMs) are vulnerable to adversarial attacks. We propose a novel metric (Degree of Anomaly) to enable current adversarial attack approaches to generate more natural and imperceptible adversarial examples. We show that our universal defense framework achieves comparable or even higher after-attack accuracy with other specific defenses.
arXiv Detail & Related papers (2022-07-21T07:51:45Z)
Generating Natural Language Adversarial Examples through An Improved Beam Search Algorithm [0.5735035463793008]
In this paper, a novel attack model is proposed, its attack success rate surpasses the benchmark attack methods. The novel method is empirically evaluated by attacking WordCNN, LSTM, BiLSTM, and BERT on four benchmark datasets. It achieves a 100% attack success rate higher than the state-of-the-art method when attacking BERT and BiLSTM on IMDB.
arXiv Detail & Related papers (2021-10-15T12:09:04Z)
Adversarial example generation with AdaBelief Optimizer and Crop Invariance [8.404340557720436]
Adversarial attacks can be an important method to evaluate and select robust models in safety-critical applications. We propose AdaBelief Iterative Fast Gradient Method (ABI-FGM) and Crop-Invariant attack Method (CIM) to improve the transferability of adversarial examples. Our method has higher success rates than state-of-the-art gradient-based attack methods.
arXiv Detail & Related papers (2021-02-07T06:00:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.