Using BERT Encoding to Tackle the Mad-lib Attack in SMS Spam Detection
- URL: http://arxiv.org/abs/2107.06400v1
- Date: Tue, 13 Jul 2021 21:17:57 GMT
- Title: Using BERT Encoding to Tackle the Mad-lib Attack in SMS Spam Detection
- Authors: Sergio Rojas-Galeano
- Abstract summary: We investigate whether language models sensitive to the semantics and context of words, such as Google's BERT, may be useful to overcome this adversarial attack.
Using a dataset of 5572 SMS spam messages, we first established a baseline of detection performance.
Then, we built a thesaurus of the vocabulary contained in these messages, and set up a Mad-lib attack experiment.
We found that the classic models achieved a 94% Balanced Accuracy (BA) in the original dataset, whereas the BERT model obtained 96%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: One of the stratagems used to deceive spam filters is to substitute vocables
with synonyms or similar words that turn the message unrecognisable by the
detection algorithms. In this paper we investigate whether the recent
development of language models sensitive to the semantics and context of words,
such as Google's BERT, may be useful to overcome this adversarial attack
(called "Mad-lib" as per the word substitution game). Using a dataset of 5572
SMS spam messages, we first established a baseline of detection performance
using widely known document representation models (BoW and TFIDF) and the novel
BERT model, coupled with a variety of classification algorithms (Decision Tree,
kNN, SVM, Logistic Regression, Naive Bayes, Multilayer Perceptron). Then, we
built a thesaurus of the vocabulary contained in these messages, and set up a
Mad-lib attack experiment in which we modified each message of a held out
subset of data (not used in the baseline experiment) with different rates of
substitution of original words with synonyms from the thesaurus. Lastly, we
evaluated the detection performance of the three representation models (BoW,
TFIDF and BERT) coupled with the best classifier from the baseline experiment
(SVM). We found that the classic models achieved a 94% Balanced Accuracy (BA)
in the original dataset, whereas the BERT model obtained 96%. On the other
hand, the Mad-lib attack experiment showed that BERT encodings manage to
maintain a similar BA performance of 96% with an average substitution rate of
1.82 words per message, and 95% with 3.34 words substituted per message. In
contrast, the BA performance of the BoW and TFIDF encoders dropped to chance.
These results hint at the potential advantage of BERT models to combat these
type of ingenious attacks, offsetting to some extent for the inappropriate use
of semantic relationships in language.
Related papers
- Paraphrasing evades detectors of AI-generated text, but retrieval is an
effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering.
Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking.
We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z) - Verifying the Robustness of Automatic Credibility Assessment [79.08422736721764]
Text classification methods have been widely investigated as a way to detect content of low credibility.
In some cases insignificant changes in input text can mislead the models.
We introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - ArabGlossBERT: Fine-Tuning BERT on Context-Gloss Pairs for WSD [0.0]
This paper presents our work to fine-tune BERT models for Arabic Word Sense Disambiguation (WSD)
We constructed a dataset of labeled Arabic context-gloss pairs.
Each pair was labeled as True or False and target words in each context were identified and annotated.
arXiv Detail & Related papers (2022-05-19T16:47:18Z) - Offensive Language Detection with BERT-based models, By Customizing
Attention Probabilities [0.0]
We suggest a methodology to enhance the performance of the BERT-based models on the Offensive Language Detection' task.
We customize attention probabilities by changing the Attention Mask' input to create more efficacious word embeddings.
The most improvement was 2% and 10% for English and Persian languages, respectively.
arXiv Detail & Related papers (2021-10-11T10:23:44Z) - BERT is Robust! A Case Against Synonym-Based Adversarial Examples in
Text Classification [8.072745157605777]
We investigate four word substitution-based attacks on BERT.
We show that their success is mainly based on feeding poor data to the model.
An additional post-processing step reduces the success rates of state-of-the-art attacks below 5%.
arXiv Detail & Related papers (2021-09-15T16:15:16Z) - Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.
Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z) - MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction.
MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context.
We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z) - FireBERT: Hardening BERT-based classifiers against adversarial attack [0.5156484100374058]
FireBERT is a set of three proof-of-concept NLP classifiers hardened against TextFooler-style word-perturbation.
We present co-tuning with a synthetic data generator as a highly effective method to protect against 95% of pre-manufactured adversarial samples.
We show that it is possible to improve the accuracy of BERT-based models in the face of adversarial attacks without significantly reducing the accuracy for regular benchmark samples.
arXiv Detail & Related papers (2020-08-10T15:43:28Z) - Wake Word Detection with Alignment-Free Lattice-Free MMI [66.12175350462263]
Always-on spoken language interfaces, e.g. personal digital assistants, rely on a wake word to start processing spoken input.
We present novel methods to train a hybrid DNN/HMM wake word detection system from partially labeled training data.
We evaluate our methods on two real data sets, showing 50%--90% reduction in false rejection rates at pre-specified false alarm rates over the best previously published figures.
arXiv Detail & Related papers (2020-05-17T19:22:25Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.