Spam Detection Using BERT
- URL: http://arxiv.org/abs/2206.02443v2
- Date: Tue, 7 Jun 2022 21:11:29 GMT
- Title: Spam Detection Using BERT
- Authors: Thaer Sahmoud, Dr. Mohammad Mikki
- Abstract summary: We build a spam detector using BERT pre-trained model that classifies emails and messages by understanding to their context.
Our spam detector performance was 98.62%, 97.83%, 99.13% and 99.28% respectively.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emails and SMSs are the most popular tools in today communications, and as
the increase of emails and SMSs users are increase, the number of spams is also
increases. Spam is any kind of unwanted, unsolicited digital communication that
gets sent out in bulk, spam emails and SMSs are causing major resource wastage
by unnecessarily flooding the network links. Although most spam mail originate
with advertisers looking to push their products, some are much more malicious
in their intent like phishing emails that aims to trick victims into giving up
sensitive information like website logins or credit card information this type
of cybercrime is known as phishing. To countermeasure spams, many researches
and efforts are done to build spam detectors that are able to filter out
messages and emails as spam or ham. In this research we build a spam detector
using BERT pre-trained model that classifies emails and messages by
understanding to their context, and we trained our spam detector model using
multiple corpuses like SMS collection corpus, Enron corpus, SpamAssassin
corpus, Ling-Spam corpus and SMS spam collection corpus, our spam detector
performance was 98.62%, 97.83%, 99.13% and 99.28% respectively. Keywords: Spam
Detector, BERT, Machine learning, NLP, Transformer, Enron Corpus, SpamAssassin
Corpus, SMS Spam Detection Corpus, Ling-Spam Corpus.
Related papers
- Investigating the Effectiveness of Bayesian Spam Filters in Detecting LLM-modified Spam Mails [1.6298172960110866]
Spam and phishing remain critical threats in cybersecurity, responsible for nearly 90% of security incidents.
As these attacks grow in sophistication, the need for robust defensive mechanisms intensifies.
The emergence of large language models (LLMs) such as ChatGPT presents new challenges.
This work aims to evaluate the robustness and effectiveness of SpamAssassin against LLM-modified email content.
arXiv Detail & Related papers (2024-08-26T14:25:30Z) - ExplainableDetector: Exploring Transformer-based Language Modeling Approach for SMS Spam Detection with Explainability Analysis [2.849988619791745]
The number of SMS spam has expanded significantly in recent years.
The unstructured format of SMS data creates significant challenges for SMS spam detection.
We employ optimized and fine-tuned transformer-based Large Language Models (LLMs) to solve the problem of spam message detection.
arXiv Detail & Related papers (2024-05-12T11:42:05Z) - SpamDam: Towards Privacy-Preserving and Adversary-Resistant SMS Spam Detection [2.0355793807035094]
SpamDam is a SMS spam detection framework designed to overcome key challenges in detecting and understanding SMS spam.
We have compiled over 76K SMS spam messages from Twitter and Weibo between 2018 and 2023, forming the largest dataset of its kind.
We have rigorously tested the adversarial robustness of SMS spam detection models, introducing the novel reverse backdoor attack.
arXiv Detail & Related papers (2024-04-15T06:07:10Z) - Prompted Contextual Vectors for Spear-Phishing Detection [45.07804966535239]
Spear-phishing attacks present a significant security challenge.
We propose a detection approach based on a novel document vectorization method.
Our method achieves a 91% F1 score in identifying LLM-generated spear-phishing emails.
arXiv Detail & Related papers (2024-02-13T09:12:55Z) - Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc.
Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques.
In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z) - Building an Effective Email Spam Classification Model with spaCy [0.0]
Author has used spaCy natural language processing library and 3 machine learning (ML) algorithms Naive Bayes (NB), Decision Tree C45 and Multilayer Perceptron (MLP) in Python programming language to detect spam emails collected from Gmail service.
arXiv Detail & Related papers (2023-03-15T17:41:11Z) - Deep convolutional forest: a dynamic deep ensemble approach for spam
detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically.
As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z) - Robust and Verifiable Information Embedding Attacks to Deep Neural
Networks via Error-Correcting Codes [81.85509264573948]
In the era of deep learning, a user often leverages a third-party machine learning tool to train a deep neural network (DNN) classifier.
In an information embedding attack, an attacker is the provider of a malicious third-party machine learning tool.
In this work, we aim to design information embedding attacks that are verifiable and robust against popular post-processing methods.
arXiv Detail & Related papers (2020-10-26T17:42:42Z) - Robust Spammer Detection by Nash Reinforcement Learning [64.80986064630025]
We develop a minimax game where the spammers and spam detectors compete with each other on their practical goals.
We show that an optimization algorithm can reliably find an equilibrial detector that can robustly prevent spammers with any mixed spamming strategies from attaining their practical goal.
arXiv Detail & Related papers (2020-06-10T21:18:07Z) - Phishing and Spear Phishing: examples in Cyber Espionage and techniques
to protect against them [91.3755431537592]
Phishing attacks have become the most used technique in the online scams, initiating more than 91% of cyberattacks, from 2012 onwards.
This study reviews how Phishing and Spear Phishing attacks are carried out by the phishers, through 5 steps which magnify the outcome.
arXiv Detail & Related papers (2020-05-31T18:10:09Z) - DeepQuarantine for Suspicious Mail [0.0]
DeepQuarantine (DQ) is a cloud technology to detect and quarantine potential spam messages.
Most of the quarantined mail is spam, which allows clients to use email without delay.
arXiv Detail & Related papers (2020-01-13T11:32:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.