Distinguishing Non-natural from Natural Adversarial Samples for More
Robust Pre-trained Language Model
- URL: http://arxiv.org/abs/2203.11199v1
- Date: Sat, 19 Mar 2022 14:06:46 GMT
- Title: Distinguishing Non-natural from Natural Adversarial Samples for More
Robust Pre-trained Language Model
- Authors: Jiayi Wang, Rongzhou Bao, Zhuosheng Zhang, Hai Zhao
- Abstract summary: We find that the adversarial samples that PrLMs fail are mostly non-natural and do not appear in reality.
We propose an anomaly detector to evaluate the robustness of PrLMs with more natural adversarial samples.
- Score: 79.18455635071817
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the problem of robustness of pre-trained language models (PrLMs)
has received increasing research interest. Latest studies on adversarial
attacks achieve high attack success rates against PrLMs, claiming that PrLMs
are not robust. However, we find that the adversarial samples that PrLMs fail
are mostly non-natural and do not appear in reality. We question the validity
of current evaluation of robustness of PrLMs based on these non-natural
adversarial samples and propose an anomaly detector to evaluate the robustness
of PrLMs with more natural adversarial samples. We also investigate two
applications of the anomaly detector: (1) In data augmentation, we employ the
anomaly detector to force generating augmented data that are distinguished as
non-natural, which brings larger gains to the accuracy of PrLMs. (2) We apply
the anomaly detector to a defense framework to enhance the robustness of PrLMs.
It can be used to defend all types of attacks and achieves higher accuracy on
both adversarial samples and compliant samples than other defense frameworks.
Related papers
- Extreme Miscalibration and the Illusion of Adversarial Robustness [66.29268991629085]
Adversarial Training is often used to increase model robustness.
We show that this observed gain in robustness is an illusion of robustness (IOR)
We urge the NLP community to incorporate test-time temperature scaling into their robustness evaluations.
arXiv Detail & Related papers (2024-02-27T13:49:12Z) - Sample Attackability in Natural Language Adversarial Attacks [1.4213973379473654]
This work formally extends the definition of sample attackability/robustness for NLP attacks.
Experiments on two popular NLP datasets, four state of the art models and four different NLP adversarial attack methods.
arXiv Detail & Related papers (2023-06-21T06:20:51Z) - Identifying Adversarially Attackable and Robust Samples [1.4213973379473654]
Adrial attacks insert small, imperceptible perturbations to input samples that cause large, undesired changes to the output of deep learning models.
This work introduces the notion of sample attackability, where we aim to identify samples that are most susceptible to adversarial attacks.
We propose a deep-learning-based detector to identify the adversarially attackable and robust samples in an unseen dataset for an unseen target model.
arXiv Detail & Related papers (2023-01-30T13:58:14Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Rethinking Textual Adversarial Defense for Pre-trained Language Models [79.18455635071817]
A literature review shows that pre-trained language models (PrLMs) are vulnerable to adversarial attacks.
We propose a novel metric (Degree of Anomaly) to enable current adversarial attack approaches to generate more natural and imperceptible adversarial examples.
We show that our universal defense framework achieves comparable or even higher after-attack accuracy with other specific defenses.
arXiv Detail & Related papers (2022-07-21T07:51:45Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - Closeness and Uncertainty Aware Adversarial Examples Detection in
Adversarial Machine Learning [0.7734726150561088]
We explore and assess the usage of 2 different groups of metrics in detecting adversarial samples.
We introduce a new feature for adversarial detection, and we show that the performances of all these metrics heavily depend on the strength of the attack being used.
arXiv Detail & Related papers (2020-12-11T14:44:59Z) - Reliable evaluation of adversarial robustness with an ensemble of
diverse parameter-free attacks [65.20660287833537]
In this paper we propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function.
We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness.
arXiv Detail & Related papers (2020-03-03T18:15:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.