Related papers: SAFER: A Structure-free Approach for Certified Robustness to Adversarial Word Substitutions

SAFER: A Structure-free Approach for Certified Robustness to Adversarial Word Substitutions

URL: http://arxiv.org/abs/2005.14424v1
Date: Fri, 29 May 2020 07:15:19 GMT
Title: SAFER: A Structure-free Approach for Certified Robustness to Adversarial Word Substitutions
Authors: Mao Ye, Chengyue Gong, Qiang Liu
Abstract summary: State-of-the-art NLP models can often be fooled by human transformations such as synonymous word substitution. We propose a new randomized smoothing technique, which constructs a ensemble by applying random word substitutions on the input sentences. Our method significantly outperforms recent state-of-the-art methods for certified robustness on both IMDB and Amazon text classification tasks.
Score: 36.91111335989236
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State-of-the-art NLP models can often be fooled by human-unaware transformations such as synonymous word substitution. For security reasons, it is of critical importance to develop models with certified robustness that can provably guarantee that the prediction is can not be altered by any possible synonymous word substitution. In this work, we propose a certified robust method based on a new randomized smoothing technique, which constructs a stochastic ensemble by applying random word substitutions on the input sentences, and leverage the statistical properties of the ensemble to provably certify the robustness. Our method is simple and structure-free in that it only requires the black-box queries of the model outputs, and hence can be applied to any pre-trained models (such as BERT) and any types of models (world-level or subword-level). Our method significantly outperforms recent state-of-the-art methods for certified robustness on both IMDB and Amazon text classification tasks. To the best of our knowledge, we are the first work to achieve certified robustness on large systems such as BERT with practically meaningful certified accuracy.

Related papers

BiCert: A Bilinear Mixed Integer Programming Formulation for Precise Certified Bounds Against Data Poisoning Attacks [62.897993591443594]
Data poisoning attacks pose one of the biggest threats to modern AI systems. Data poisoning attacks pose one of the biggest threats to modern AI systems. Data poisoning attacks pose one of the biggest threats to modern AI systems.
arXiv Detail & Related papers (2024-12-13T14:56:39Z)
CR-UTP: Certified Robustness against Universal Text Perturbations on Large Language Models [12.386141652094999]
Existing certified robustness based on random smoothing has shown considerable promise in certifying the input-specific text perturbations. A naive method is to simply increase the masking ratio and the likelihood of masking attack tokens, but it leads to a significant reduction in both certified accuracy and the certified radius. We introduce a novel approach, designed to identify a superior prompt that maintains higher certified accuracy under extensive masking.
arXiv Detail & Related papers (2024-06-04T01:02:22Z)
Towards Certified Probabilistic Robustness with High Accuracy [3.957941698534126]
Adrial examples pose a security threat to many critical systems built on neural networks. How to build certifiably robust yet accurate neural network models remains an open problem. We propose a novel approach that aims to achieve both high accuracy and certified probabilistic robustness.
arXiv Detail & Related papers (2023-09-02T09:39:47Z)
Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks [39.51297217854375]
We propose Text-CRS, a certified robustness framework for natural language processing (NLP) based on randomized smoothing. We show that Text-CRS can address all four different word-level adversarial operations and achieve a significant accuracy improvement. We also provide the first benchmark on certified accuracy and radius of four word-level operations, besides outperforming the state-of-the-art certification against synonym substitution attacks.
arXiv Detail & Related papers (2023-07-31T13:08:16Z)
In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks. Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks. We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z)
Confidence-aware Training of Smoothed Classifiers for Certified Robustness [75.95332266383417]
We use "accuracy under Gaussian noise" as an easy-to-compute proxy of adversarial robustness for an input. Our experiments show that the proposed method consistently exhibits improved certified robustness upon state-of-the-art training methods.
arXiv Detail & Related papers (2022-12-18T03:57:12Z)
GSmooth: Certified Robustness against Semantic Transformations via Generalized Randomized Smoothing [40.38555458216436]
We propose a unified theoretical framework for certifying robustness against general semantic transformations. Under the GSmooth framework, we present a scalable algorithm that uses a surrogate image-to-image network to approximate the complex transformation.
arXiv Detail & Related papers (2022-06-09T07:12:17Z)
Joint Differentiable Optimization and Verification for Certified Reinforcement Learning [91.93635157885055]
In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties. We propose a framework that jointly conducts reinforcement learning and formal verification.
arXiv Detail & Related papers (2022-01-28T16:53:56Z)
Quantifying Robustness to Adversarial Word Substitutions [24.164523751390053]
Deep-learning-based NLP models are found to be vulnerable to word substitution perturbations. We propose a formal framework to evaluate word-level robustness. metric helps us figure out why state-of-the-art models like BERT can be easily fooled by a few word substitutions.
arXiv Detail & Related papers (2022-01-11T08:18:39Z)
SmoothMix: Training Confidence-calibrated Smoothed Classifiers for Certified Robustness [61.212486108346695]
We propose a training scheme, coined SmoothMix, to control the robustness of smoothed classifiers via self-mixup. The proposed procedure effectively identifies over-confident, near off-class samples as a cause of limited robustness. Our experimental results demonstrate that the proposed method can significantly improve the certified $ell$-robustness of smoothed classifiers.
arXiv Detail & Related papers (2021-11-17T18:20:59Z)
MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction. MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context. We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.