Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks
- URL: http://arxiv.org/abs/2307.16630v2
- Date: Tue, 11 Jun 2024 15:40:43 GMT
- Title: Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks
- Authors: Xinyu Zhang, Hanbin Hong, Yuan Hong, Peng Huang, Binghui Wang, Zhongjie Ba, Kui Ren,
- Abstract summary: We propose Text-CRS, a certified robustness framework for natural language processing (NLP) based on randomized smoothing.
We show that Text-CRS can address all four different word-level adversarial operations and achieve a significant accuracy improvement.
We also provide the first benchmark on certified accuracy and radius of four word-level operations, besides outperforming the state-of-the-art certification against synonym substitution attacks.
- Score: 39.51297217854375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The language models, especially the basic text classification models, have been shown to be susceptible to textual adversarial attacks such as synonym substitution and word insertion attacks. To defend against such attacks, a growing body of research has been devoted to improving the model robustness. However, providing provable robustness guarantees instead of empirical robustness is still widely unexplored. In this paper, we propose Text-CRS, a generalized certified robustness framework for natural language processing (NLP) based on randomized smoothing. To our best knowledge, existing certified schemes for NLP can only certify the robustness against $\ell_0$ perturbations in synonym substitution attacks. Representing each word-level adversarial operation (i.e., synonym substitution, word reordering, insertion, and deletion) as a combination of permutation and embedding transformation, we propose novel smoothing theorems to derive robustness bounds in both permutation and embedding space against such adversarial operations. To further improve certified accuracy and radius, we consider the numerical relationships between discrete words and select proper noise distributions for the randomized smoothing. Finally, we conduct substantial experiments on multiple language models and datasets. Text-CRS can address all four different word-level adversarial operations and achieve a significant accuracy improvement. We also provide the first benchmark on certified accuracy and radius of four word-level operations, besides outperforming the state-of-the-art certification against synonym substitution attacks.
Related papers
- CR-UTP: Certified Robustness against Universal Text Perturbations on Large Language Models [12.386141652094999]
Existing certified robustness based on random smoothing has shown considerable promise in certifying the input-specific text perturbations.
A naive method is to simply increase the masking ratio and the likelihood of masking attack tokens, but it leads to a significant reduction in both certified accuracy and the certified radius.
We introduce a novel approach, designed to identify a superior prompt that maintains higher certified accuracy under extensive masking.
arXiv Detail & Related papers (2024-06-04T01:02:22Z) - Towards preserving word order importance through Forced Invalidation [80.33036864442182]
We show that pre-trained language models are insensitive to word order.
We propose Forced Invalidation to help preserve the importance of word order.
Our experiments demonstrate that Forced Invalidation significantly improves the sensitivity of the models to word order.
arXiv Detail & Related papers (2023-04-11T13:42:10Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Towards Robustness Against Natural Language Word Substitutions [87.56898475512703]
Robustness against word substitutions has a well-defined and widely acceptable form, using semantically similar words as substitutions.
Previous defense methods capture word substitutions in vector space by using either $l$-ball or hyper-rectangle.
arXiv Detail & Related papers (2021-07-28T17:55:08Z) - Certified Robustness to Text Adversarial Attacks by Randomized [MASK] [39.07743913719665]
We propose a certifiably robust defense method by randomly masking a certain proportion of the words in an input text.
The proposed method can defend against not only word substitution-based attacks, but also character-level perturbations.
We can certify the classifications of over 50% texts to be robust to any perturbation of 5 words on AGNEWS, and 2 words on SST2 dataset.
arXiv Detail & Related papers (2021-05-08T16:59:10Z) - MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction.
MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context.
We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z) - SAFER: A Structure-free Approach for Certified Robustness to Adversarial
Word Substitutions [36.91111335989236]
State-of-the-art NLP models can often be fooled by human transformations such as synonymous word substitution.
We propose a new randomized smoothing technique, which constructs a ensemble by applying random word substitutions on the input sentences.
Our method significantly outperforms recent state-of-the-art methods for certified robustness on both IMDB and Amazon text classification tasks.
arXiv Detail & Related papers (2020-05-29T07:15:19Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.