Assessing Robustness of Text Classification through Maximal Safe Radius
Computation
- URL: http://arxiv.org/abs/2010.02004v2
- Date: Wed, 7 Oct 2020 08:50:10 GMT
- Title: Assessing Robustness of Text Classification through Maximal Safe Radius
Computation
- Authors: Emanuele La Malfa, Min Wu, Luca Laurenti, Benjie Wang, Anthony
Hartshorn, Marta Kwiatkowska
- Abstract summary: We aim to provide guarantees that the model prediction does not change if a word is replaced with a plausible alternative, such as a synonym.
As a measure of robustness, we adopt the notion of the maximal safe radius for a given input text, which is the minimum distance in the embedding space to the decision boundary.
For the upper bound computation, we employ Monte Carlo Tree Search in conjunction with syntactic filtering to analyse the effect of single and multiple word substitutions.
- Score: 21.05890715709053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural network NLP models are vulnerable to small modifications of the input
that maintain the original meaning but result in a different prediction. In
this paper, we focus on robustness of text classification against word
substitutions, aiming to provide guarantees that the model prediction does not
change if a word is replaced with a plausible alternative, such as a synonym.
As a measure of robustness, we adopt the notion of the maximal safe radius for
a given input text, which is the minimum distance in the embedding space to the
decision boundary. Since computing the exact maximal safe radius is not
feasible in practice, we instead approximate it by computing a lower and upper
bound. For the upper bound computation, we employ Monte Carlo Tree Search in
conjunction with syntactic filtering to analyse the effect of single and
multiple word substitutions. The lower bound computation is achieved through an
adaptation of the linear bounding techniques implemented in tools CNN-Cert and
POPQORN, respectively for convolutional and recurrent network models. We
evaluate the methods on sentiment analysis and news classification models for
four datasets (IMDB, SST, AG News and NEWS) and a range of embeddings, and
provide an analysis of robustness trends. We also apply our framework to
interpretability analysis and compare it with LIME.
Related papers
- Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks [39.51297217854375]
We propose Text-CRS, a certified robustness framework for natural language processing (NLP) based on randomized smoothing.
We show that Text-CRS can address all four different word-level adversarial operations and achieve a significant accuracy improvement.
We also provide the first benchmark on certified accuracy and radius of four word-level operations, besides outperforming the state-of-the-art certification against synonym substitution attacks.
arXiv Detail & Related papers (2023-07-31T13:08:16Z) - Semantic Similarity Computing Model Based on Multi Model Fine-Grained
Nonlinear Fusion [30.71123144365683]
This paper proposes a novel model based on multi model nonlinear fusion to grasp the meaning of a text from a global perspective.
The model uses the Jaccard coefficient based on part of speech, Term Frequency-Inverse Document Frequency (TF-IDF) and word2vec-CNN algorithm to measure the similarity of sentences.
Experimental results show that the matching of sentence similarity calculation method based on multi model nonlinear fusion is 84%, and the F1 value of the model is 75%.
arXiv Detail & Related papers (2022-02-05T03:12:37Z) - Quantifying Robustness to Adversarial Word Substitutions [24.164523751390053]
Deep-learning-based NLP models are found to be vulnerable to word substitution perturbations.
We propose a formal framework to evaluate word-level robustness.
metric helps us figure out why state-of-the-art models like BERT can be easily fooled by a few word substitutions.
arXiv Detail & Related papers (2022-01-11T08:18:39Z) - Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.
Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Adaptive Nearest Neighbor Machine Translation [60.97183408140499]
kNN-MT combines pre-trained neural machine translation with token-level k-nearest-neighbor retrieval.
Traditional kNN algorithm simply retrieves a same number of nearest neighbors for each target token.
We propose Adaptive kNN-MT to dynamically determine the number of k for each target token.
arXiv Detail & Related papers (2021-05-27T09:27:42Z) - MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction.
MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context.
We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z) - A Correspondence Variational Autoencoder for Unsupervised Acoustic Word
Embeddings [50.524054820564395]
We propose a new unsupervised model for mapping a variable-duration speech segment to a fixed-dimensional representation.
The resulting acoustic word embeddings can form the basis of search, discovery, and indexing systems for low- and zero-resource languages.
arXiv Detail & Related papers (2020-12-03T19:24:42Z) - Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood
Ensemble [163.3333439344695]
Dirichlet Neighborhood Ensemble (DNE) is a randomized smoothing method for training a robust model to defense substitution-based attacks.
DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data.
We demonstrate through extensive experimentation that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
arXiv Detail & Related papers (2020-06-20T18:01:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.