Building an Effective Email Spam Classification Model with spaCy
- URL: http://arxiv.org/abs/2303.08792v1
- Date: Wed, 15 Mar 2023 17:41:11 GMT
- Title: Building an Effective Email Spam Classification Model with spaCy
- Authors: Kazem Taghandiki
- Abstract summary: Author has used spaCy natural language processing library and 3 machine learning (ML) algorithms Naive Bayes (NB), Decision Tree C45 and Multilayer Perceptron (MLP) in Python programming language to detect spam emails collected from Gmail service.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Today, people use email services such as Gmail, Outlook, AOL Mail, etc. to
communicate with each other as quickly as possible to send information and
official letters. Spam or junk mail is a major challenge to this type of
communication, usually sent by botnets with the aim of advertising, harming and
stealing information in bulk to different people. Receiving unwanted spam
emails on a daily basis fills up the inbox folder. Therefore, spam detection is
a fundamental challenge, so far many works have been done to detect spam using
clustering and text categorisation methods. In this article, the author has
used the spaCy natural language processing library and 3 machine learning (ML)
algorithms Naive Bayes (NB), Decision Tree C45 and Multilayer Perceptron (MLP)
in the Python programming language to detect spam emails collected from the
Gmail service. Observations show the accuracy rate (96%) of the Multilayer
Perceptron (MLP) algorithm in spam detection.
Related papers
- Investigating the Effectiveness of Bayesian Spam Filters in Detecting LLM-modified Spam Mails [1.6298172960110866]
Spam and phishing remain critical threats in cybersecurity, responsible for nearly 90% of security incidents.
As these attacks grow in sophistication, the need for robust defensive mechanisms intensifies.
The emergence of large language models (LLMs) such as ChatGPT presents new challenges.
This work aims to evaluate the robustness and effectiveness of SpamAssassin against LLM-modified email content.
arXiv Detail & Related papers (2024-08-26T14:25:30Z) - Prompted Contextual Vectors for Spear-Phishing Detection [45.07804966535239]
Spear-phishing attacks present a significant security challenge.
We propose a detection approach based on a novel document vectorization method.
Our method achieves a 91% F1 score in identifying LLM-generated spear-phishing emails.
arXiv Detail & Related papers (2024-02-13T09:12:55Z) - SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks [99.23352758320945]
We propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks on large language models (LLMs)
Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense first randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs.
arXiv Detail & Related papers (2023-10-05T17:01:53Z) - Spam Detection Using BERT [0.0]
We build a spam detector using BERT pre-trained model that classifies emails and messages by understanding to their context.
Our spam detector performance was 98.62%, 97.83%, 99.13% and 99.28% respectively.
arXiv Detail & Related papers (2022-06-06T09:09:40Z) - Anomaly Detection in Emails using Machine Learning and Header
Information [0.0]
Anomalies in emails such as phishing and spam present major security risks.
Previous studies on email anomaly detection relied on a single type of anomaly and the analysis of the email body and subject content.
This study conducted feature extraction and selection on email header datasets and leveraged both multi and one-class anomaly detection approaches.
arXiv Detail & Related papers (2022-03-19T23:31:23Z) - Deep convolutional forest: a dynamic deep ensemble approach for spam
detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically.
As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z) - Robust and Verifiable Information Embedding Attacks to Deep Neural
Networks via Error-Correcting Codes [81.85509264573948]
In the era of deep learning, a user often leverages a third-party machine learning tool to train a deep neural network (DNN) classifier.
In an information embedding attack, an attacker is the provider of a malicious third-party machine learning tool.
In this work, we aim to design information embedding attacks that are verifiable and robust against popular post-processing methods.
arXiv Detail & Related papers (2020-10-26T17:42:42Z) - Robust Spammer Detection by Nash Reinforcement Learning [64.80986064630025]
We develop a minimax game where the spammers and spam detectors compete with each other on their practical goals.
We show that an optimization algorithm can reliably find an equilibrial detector that can robustly prevent spammers with any mixed spamming strategies from attaining their practical goal.
arXiv Detail & Related papers (2020-06-10T21:18:07Z) - Learning with Weak Supervision for Email Intent Detection [56.71599262462638]
We propose to leverage user actions as a source of weak supervision to detect intents in emails.
We develop an end-to-end robust deep neural network model for email intent identification.
arXiv Detail & Related papers (2020-05-26T23:41:05Z) - Classification of Spam Emails through Hierarchical Clustering and
Supervised Learning [1.8065361710947976]
We propose to classify spam email in categories to improve the handle of already detected spam emails.
For the task of multi-class spam classification, the use of TF-IDF combined with SVM for the best micro F1 score performance, $95.39%$, and (ii) TD-IDF along with NB for the fastest spam classification, analyzing an email in $2.13$ms.
arXiv Detail & Related papers (2020-05-18T14:41:22Z) - DeepQuarantine for Suspicious Mail [0.0]
DeepQuarantine (DQ) is a cloud technology to detect and quarantine potential spam messages.
Most of the quarantined mail is spam, which allows clients to use email without delay.
arXiv Detail & Related papers (2020-01-13T11:32:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.