Related papers: Distinguishing Non-natural from Natural Adversarial Samples for More Robust Pre-trained Language Model

Distinguishing Non-natural from Natural Adversarial Samples for More Robust Pre-trained Language Model

URL: http://arxiv.org/abs/2203.11199v1
Date: Sat, 19 Mar 2022 14:06:46 GMT
Title: Distinguishing Non-natural from Natural Adversarial Samples for More Robust Pre-trained Language Model
Authors: Jiayi Wang, Rongzhou Bao, Zhuosheng Zhang, Hai Zhao
Abstract summary: We find that the adversarial samples that PrLMs fail are mostly non-natural and do not appear in reality. We propose an anomaly detector to evaluate the robustness of PrLMs with more natural adversarial samples.
Score: 79.18455635071817
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, the problem of robustness of pre-trained language models (PrLMs) has received increasing research interest. Latest studies on adversarial attacks achieve high attack success rates against PrLMs, claiming that PrLMs are not robust. However, we find that the adversarial samples that PrLMs fail are mostly non-natural and do not appear in reality. We question the validity of current evaluation of robustness of PrLMs based on these non-natural adversarial samples and propose an anomaly detector to evaluate the robustness of PrLMs with more natural adversarial samples. We also investigate two applications of the anomaly detector: (1) In data augmentation, we employ the anomaly detector to force generating augmented data that are distinguished as non-natural, which brings larger gains to the accuracy of PrLMs. (2) We apply the anomaly detector to a defense framework to enhance the robustness of PrLMs. It can be used to defend all types of attacks and achieves higher accuracy on both adversarial samples and compliant samples than other defense frameworks.

Related papers

Rectifying Adversarial Sample with Low Entropy Prior for Test-Time Defense [44.263763516566996]
Existing defense methods fail to defend against unknown attacks.<n>We reveal the commonly overlooked low entropy prior implied in various adversarial samples.<n>We propose a two-stage REAL approach: Rectify Adversarial sample based on LE prior for test-time adversarial rectification.
arXiv Detail & Related papers (2025-07-04T09:35:01Z)
Extreme Miscalibration and the Illusion of Adversarial Robustness [66.29268991629085]
Adversarial Training is often used to increase model robustness. We show that this observed gain in robustness is an illusion of robustness (IOR) We urge the NLP community to incorporate test-time temperature scaling into their robustness evaluations.
arXiv Detail & Related papers (2024-02-27T13:49:12Z)
Sample Attackability in Natural Language Adversarial Attacks [1.4213973379473654]
This work formally extends the definition of sample attackability/robustness for NLP attacks. Experiments on two popular NLP datasets, four state of the art models and four different NLP adversarial attack methods.
arXiv Detail & Related papers (2023-06-21T06:20:51Z)
Identifying Adversarially Attackable and Robust Samples [1.4213973379473654]
Adrial attacks insert small, imperceptible perturbations to input samples that cause large, undesired changes to the output of deep learning models. This work introduces the notion of sample attackability, where we aim to identify samples that are most susceptible to adversarial attacks. We propose a deep-learning-based detector to identify the adversarially attackable and robust samples in an unseen dataset for an unseen target model.
arXiv Detail & Related papers (2023-01-30T13:58:14Z)
Improving Adversarial Robustness to Sensitivity and Invariance Attacks with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample. We use metric learning to frame adversarial regularization as an optimal transport problem. Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z)
Rethinking Textual Adversarial Defense for Pre-trained Language Models [79.18455635071817]
A literature review shows that pre-trained language models (PrLMs) are vulnerable to adversarial attacks. We propose a novel metric (Degree of Anomaly) to enable current adversarial attack approaches to generate more natural and imperceptible adversarial examples. We show that our universal defense framework achieves comparable or even higher after-attack accuracy with other specific defenses.
arXiv Detail & Related papers (2022-07-21T07:51:45Z)
Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs. We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z)
Closeness and Uncertainty Aware Adversarial Examples Detection in Adversarial Machine Learning [0.7734726150561088]
We explore and assess the usage of 2 different groups of metrics in detecting adversarial samples. We introduce a new feature for adversarial detection, and we show that the performances of all these metrics heavily depend on the strength of the attack being used.
arXiv Detail & Related papers (2020-12-11T14:44:59Z)
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks [65.20660287833537]
In this paper we propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function. We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness.
arXiv Detail & Related papers (2020-03-03T18:15:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.