Quantifying Robustness to Adversarial Word Substitutions
- URL: http://arxiv.org/abs/2201.03829v1
- Date: Tue, 11 Jan 2022 08:18:39 GMT
- Title: Quantifying Robustness to Adversarial Word Substitutions
- Authors: Yuting Yang, Pei Huang, FeiFei Ma, Juan Cao, Meishan Zhang, Jian Zhang
and Jintao Li
- Abstract summary: Deep-learning-based NLP models are found to be vulnerable to word substitution perturbations.
We propose a formal framework to evaluate word-level robustness.
metric helps us figure out why state-of-the-art models like BERT can be easily fooled by a few word substitutions.
- Score: 24.164523751390053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep-learning-based NLP models are found to be vulnerable to word
substitution perturbations. Before they are widely adopted, the fundamental
issues of robustness need to be addressed. Along this line, we propose a formal
framework to evaluate word-level robustness. First, to study safe regions for a
model, we introduce robustness radius which is the boundary where the model can
resist any perturbation. As calculating the maximum robustness radius is
computationally hard, we estimate its upper and lower bound. We repurpose
attack methods as ways of seeking upper bound and design a pseudo-dynamic
programming algorithm for a tighter upper bound. Then verification method is
utilized for a lower bound. Further, for evaluating the robustness of regions
outside a safe radius, we reexamine robustness from another view:
quantification. A robustness metric with a rigorous statistical guarantee is
introduced to measure the quantification of adversarial examples, which
indicates the model's susceptibility to perturbations outside the safe radius.
The metric helps us figure out why state-of-the-art models like BERT can be
easily fooled by a few word substitutions, but generalize well in the presence
of real-world noises.
Related papers
- Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete.
We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z) - Towards Precise Observations of Neural Model Robustness in Classification [2.127049691404299]
In deep learning applications, robustness measures the ability of neural models that handle slight changes in input data.
Our approach contributes to a deeper understanding of model robustness in safety-critical applications.
arXiv Detail & Related papers (2024-04-25T09:37:44Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - RobustMQ: Benchmarking Robustness of Quantized Models [54.15661421492865]
Quantization is an essential technique for deploying deep neural networks (DNNs) on devices with limited resources.
We thoroughly evaluated the robustness of quantized models against various noises (adrial attacks, natural corruptions, and systematic noises) on ImageNet.
Our research contributes to advancing the robust quantization of models and their deployment in real-world scenarios.
arXiv Detail & Related papers (2023-08-04T14:37:12Z) - ADDMU: Detection of Far-Boundary Adversarial Examples with Data and
Model Uncertainty Estimation [125.52743832477404]
Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks.
We propose a new technique, textbfADDMU, which combines two types of uncertainty estimation for both regular and FB adversarial example detection.
Our new method outperforms previous methods by 3.6 and 6.0 emphAUC points under each scenario.
arXiv Detail & Related papers (2022-10-22T09:11:12Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Consistent Non-Parametric Methods for Adaptive Robustness [26.016647703500887]
A major drawback of the standard robust learning framework is the imposition of an artificial robustness radius $r$ that applies to all inputs.
We propose a new framework for adaptive robustness, called neighborhood preserving robustness.
arXiv Detail & Related papers (2021-02-18T00:44:07Z) - Attribute-Guided Adversarial Training for Robustness to Natural
Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space.
Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z) - Constrained Model-based Reinforcement Learning with Robust Cross-Entropy
Method [30.407700996710023]
This paper studies the constrained/safe reinforcement learning problem with sparse indicator signals for constraint violations.
We employ the neural network ensemble model to estimate the prediction uncertainty and use model predictive control as the basic control framework.
The results show that our approach learns to complete the tasks with a much smaller number of constraint violations than state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-15T18:19:35Z) - Assessing Robustness of Text Classification through Maximal Safe Radius
Computation [21.05890715709053]
We aim to provide guarantees that the model prediction does not change if a word is replaced with a plausible alternative, such as a synonym.
As a measure of robustness, we adopt the notion of the maximal safe radius for a given input text, which is the minimum distance in the embedding space to the decision boundary.
For the upper bound computation, we employ Monte Carlo Tree Search in conjunction with syntactic filtering to analyse the effect of single and multiple word substitutions.
arXiv Detail & Related papers (2020-10-01T09:46:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.