Related papers: Quantifying Robustness to Adversarial Word Substitutions

Quantifying Robustness to Adversarial Word Substitutions

URL: http://arxiv.org/abs/2201.03829v1
Date: Tue, 11 Jan 2022 08:18:39 GMT
Title: Quantifying Robustness to Adversarial Word Substitutions
Authors: Yuting Yang, Pei Huang, FeiFei Ma, Juan Cao, Meishan Zhang, Jian Zhang and Jintao Li
Abstract summary: Deep-learning-based NLP models are found to be vulnerable to word substitution perturbations. We propose a formal framework to evaluate word-level robustness. metric helps us figure out why state-of-the-art models like BERT can be easily fooled by a few word substitutions.
Score: 24.164523751390053
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep-learning-based NLP models are found to be vulnerable to word substitution perturbations. Before they are widely adopted, the fundamental issues of robustness need to be addressed. Along this line, we propose a formal framework to evaluate word-level robustness. First, to study safe regions for a model, we introduce robustness radius which is the boundary where the model can resist any perturbation. As calculating the maximum robustness radius is computationally hard, we estimate its upper and lower bound. We repurpose attack methods as ways of seeking upper bound and design a pseudo-dynamic programming algorithm for a tighter upper bound. Then verification method is utilized for a lower bound. Further, for evaluating the robustness of regions outside a safe radius, we reexamine robustness from another view: quantification. A robustness metric with a rigorous statistical guarantee is introduced to measure the quantification of adversarial examples, which indicates the model's susceptibility to perturbations outside the safe radius. The metric helps us figure out why state-of-the-art models like BERT can be easily fooled by a few word substitutions, but generalize well in the presence of real-world noises.

Related papers

An Uncertainty-aware DETR Enhancement Framework for Object Detection [10.102900613370817]
We propose an uncertainty-aware enhancement framework for DETR-based object detectors.<n>We derive a Bayes Risk formulation to filter high-risk information and improve detection reliability.<n> Experiments on the COCO benchmark show that our method can be effectively integrated into existing DETR variants.
arXiv Detail & Related papers (2025-07-20T07:53:04Z)
Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation [52.626086874715284]
We introduce a novel problem formulation called Abstract DNN-Verification, which verifies a hierarchical structure of unsafe outputs.<n>By leveraging abstract interpretation and reasoning about output reachable sets, our approach enables assessing multiple safety levels during the formal verification process.<n>Our contributions include a theoretical exploration of the relationship between our novel abstract safety formulation and existing approaches.
arXiv Detail & Related papers (2025-05-08T13:29:46Z)
Benchmarking the Spatial Robustness of DNNs via Natural and Adversarial Localized Corruptions [49.546479320670464]
This paper introduces specialized metrics for benchmarking the spatial robustness of segmentation models. We propose region-aware multi-attack adversarial analysis, a method that enables a deeper understanding of model robustness. The results reveal that models respond to these two types of threats differently.
arXiv Detail & Related papers (2025-04-02T11:37:39Z)
Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete. We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z)
Towards Precise Observations of Neural Model Robustness in Classification [2.127049691404299]
In deep learning applications, robustness measures the ability of neural models that handle slight changes in input data. Our approach contributes to a deeper understanding of model robustness in safety-critical applications.
arXiv Detail & Related papers (2024-04-25T09:37:44Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
RobustMQ: Benchmarking Robustness of Quantized Models [54.15661421492865]
Quantization is an essential technique for deploying deep neural networks (DNNs) on devices with limited resources. We thoroughly evaluated the robustness of quantized models against various noises (adrial attacks, natural corruptions, and systematic noises) on ImageNet. Our research contributes to advancing the robust quantization of models and their deployment in real-world scenarios.
arXiv Detail & Related papers (2023-08-04T14:37:12Z)
ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation [125.52743832477404]
Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks. We propose a new technique, textbfADDMU, which combines two types of uncertainty estimation for both regular and FB adversarial example detection. Our new method outperforms previous methods by 3.6 and 6.0 emphAUC points under each scenario.
arXiv Detail & Related papers (2022-10-22T09:11:12Z)
Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial. Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size. We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z)
Consistent Non-Parametric Methods for Adaptive Robustness [26.016647703500887]
A major drawback of the standard robust learning framework is the imposition of an artificial robustness radius $r$ that applies to all inputs. We propose a new framework for adaptive robustness, called neighborhood preserving robustness.
arXiv Detail & Related papers (2021-02-18T00:44:07Z)
Attribute-Guided Adversarial Training for Robustness to Natural Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space. Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z)
Constrained Model-based Reinforcement Learning with Robust Cross-Entropy Method [30.407700996710023]
This paper studies the constrained/safe reinforcement learning problem with sparse indicator signals for constraint violations. We employ the neural network ensemble model to estimate the prediction uncertainty and use model predictive control as the basic control framework. The results show that our approach learns to complete the tasks with a much smaller number of constraint violations than state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-15T18:19:35Z)
Assessing Robustness of Text Classification through Maximal Safe Radius Computation [21.05890715709053]
We aim to provide guarantees that the model prediction does not change if a word is replaced with a plausible alternative, such as a synonym. As a measure of robustness, we adopt the notion of the maximal safe radius for a given input text, which is the minimum distance in the embedding space to the decision boundary. For the upper bound computation, we employ Monte Carlo Tree Search in conjunction with syntactic filtering to analyse the effect of single and multiple word substitutions.
arXiv Detail & Related papers (2020-10-01T09:46:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.