SAFER: A Structure-free Approach for Certified Robustness to Adversarial
Word Substitutions
- URL: http://arxiv.org/abs/2005.14424v1
- Date: Fri, 29 May 2020 07:15:19 GMT
- Title: SAFER: A Structure-free Approach for Certified Robustness to Adversarial
Word Substitutions
- Authors: Mao Ye, Chengyue Gong, Qiang Liu
- Abstract summary: State-of-the-art NLP models can often be fooled by human transformations such as synonymous word substitution.
We propose a new randomized smoothing technique, which constructs a ensemble by applying random word substitutions on the input sentences.
Our method significantly outperforms recent state-of-the-art methods for certified robustness on both IMDB and Amazon text classification tasks.
- Score: 36.91111335989236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art NLP models can often be fooled by human-unaware
transformations such as synonymous word substitution. For security reasons, it
is of critical importance to develop models with certified robustness that can
provably guarantee that the prediction is can not be altered by any possible
synonymous word substitution. In this work, we propose a certified robust
method based on a new randomized smoothing technique, which constructs a
stochastic ensemble by applying random word substitutions on the input
sentences, and leverage the statistical properties of the ensemble to provably
certify the robustness. Our method is simple and structure-free in that it only
requires the black-box queries of the model outputs, and hence can be applied
to any pre-trained models (such as BERT) and any types of models (world-level
or subword-level). Our method significantly outperforms recent state-of-the-art
methods for certified robustness on both IMDB and Amazon text classification
tasks. To the best of our knowledge, we are the first work to achieve certified
robustness on large systems such as BERT with practically meaningful certified
accuracy.
Related papers
- CR-UTP: Certified Robustness against Universal Text Perturbations on Large Language Models [12.386141652094999]
Existing certified robustness based on random smoothing has shown considerable promise in certifying the input-specific text perturbations.
A naive method is to simply increase the masking ratio and the likelihood of masking attack tokens, but it leads to a significant reduction in both certified accuracy and the certified radius.
We introduce a novel approach, designed to identify a superior prompt that maintains higher certified accuracy under extensive masking.
arXiv Detail & Related papers (2024-06-04T01:02:22Z) - Towards Certified Probabilistic Robustness with High Accuracy [3.957941698534126]
Adrial examples pose a security threat to many critical systems built on neural networks.
How to build certifiably robust yet accurate neural network models remains an open problem.
We propose a novel approach that aims to achieve both high accuracy and certified probabilistic robustness.
arXiv Detail & Related papers (2023-09-02T09:39:47Z) - Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks [39.51297217854375]
We propose Text-CRS, a certified robustness framework for natural language processing (NLP) based on randomized smoothing.
We show that Text-CRS can address all four different word-level adversarial operations and achieve a significant accuracy improvement.
We also provide the first benchmark on certified accuracy and radius of four word-level operations, besides outperforming the state-of-the-art certification against synonym substitution attacks.
arXiv Detail & Related papers (2023-07-31T13:08:16Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Confidence-aware Training of Smoothed Classifiers for Certified
Robustness [75.95332266383417]
We use "accuracy under Gaussian noise" as an easy-to-compute proxy of adversarial robustness for an input.
Our experiments show that the proposed method consistently exhibits improved certified robustness upon state-of-the-art training methods.
arXiv Detail & Related papers (2022-12-18T03:57:12Z) - GSmooth: Certified Robustness against Semantic Transformations via
Generalized Randomized Smoothing [40.38555458216436]
We propose a unified theoretical framework for certifying robustness against general semantic transformations.
Under the GSmooth framework, we present a scalable algorithm that uses a surrogate image-to-image network to approximate the complex transformation.
arXiv Detail & Related papers (2022-06-09T07:12:17Z) - Joint Differentiable Optimization and Verification for Certified
Reinforcement Learning [91.93635157885055]
In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties.
We propose a framework that jointly conducts reinforcement learning and formal verification.
arXiv Detail & Related papers (2022-01-28T16:53:56Z) - Quantifying Robustness to Adversarial Word Substitutions [24.164523751390053]
Deep-learning-based NLP models are found to be vulnerable to word substitution perturbations.
We propose a formal framework to evaluate word-level robustness.
metric helps us figure out why state-of-the-art models like BERT can be easily fooled by a few word substitutions.
arXiv Detail & Related papers (2022-01-11T08:18:39Z) - SmoothMix: Training Confidence-calibrated Smoothed Classifiers for
Certified Robustness [61.212486108346695]
We propose a training scheme, coined SmoothMix, to control the robustness of smoothed classifiers via self-mixup.
The proposed procedure effectively identifies over-confident, near off-class samples as a cause of limited robustness.
Our experimental results demonstrate that the proposed method can significantly improve the certified $ell$-robustness of smoothed classifiers.
arXiv Detail & Related papers (2021-11-17T18:20:59Z) - MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction.
MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context.
We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.