Effective Email Spam Detection System using Extreme Gradient Boosting
- URL: http://arxiv.org/abs/2012.14430v1
- Date: Sun, 27 Dec 2020 15:23:58 GMT
- Title: Effective Email Spam Detection System using Extreme Gradient Boosting
- Authors: Ismail B. Mustapha, Shafaatunnur Hasan, Sunday O. Olatunji, Siti
Mariyam Shamsuddin, Afolabi Kazeem
- Abstract summary: This research is an improved spam detection model based on Extreme Gradient Boosting (XGBoost)
Experimental results show that the proposed model outperforms earlier approaches across a wide range of evaluation metrics.
- Score: 1.8899300124593645
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The popularity, cost-effectiveness and ease of information exchange that
electronic mails offer to electronic device users has been plagued with the
rising number of unsolicited or spam emails. Driven by the need to protect
email users from this growing menace, research in spam email
filtering/detection systems has being increasingly active in the last decade.
However, the adaptive nature of spam emails has often rendered most of these
systems ineffective. While several spam detection models have been reported in
literature, the reported performance on an out of sample test data shows the
room for more improvement. Presented in this research is an improved spam
detection model based on Extreme Gradient Boosting (XGBoost) which to the best
of our knowledge has received little attention spam email detection problems.
Experimental results show that the proposed model outperforms earlier
approaches across a wide range of evaluation metrics. A thorough analysis of
the model results in comparison to the results of earlier works is also
presented.
Related papers
- Resultant: Incremental Effectiveness on Likelihood for Unsupervised Out-of-Distribution Detection [63.93728560200819]
Unsupervised out-of-distribution (U-OOD) detection is to identify data samples with a detector trained solely on unlabeled in-distribution (ID) data.
Recent studies have developed various detectors based on DGMs to move beyond likelihood.
We apply two techniques for each direction, specifically post-hoc prior and dataset entropy-mutual calibration.
Experimental results demonstrate that the Resultant could be a new state-of-the-art U-OOD detector.
arXiv Detail & Related papers (2024-09-05T02:58:13Z) - Different Victims, Same Layout: Email Visual Similarity Detection for Enhanced Email Protection [0.3683202928838613]
We propose an email visual similarity detection approach, named Pisco, to improve the detection capabilities of an email threat defense system.
Our results show that email kits are being reused extensively and visually similar emails are sent to our customers at various time intervals.
arXiv Detail & Related papers (2024-08-29T23:51:51Z) - Investigating the Effectiveness of Bayesian Spam Filters in Detecting LLM-modified Spam Mails [1.6298172960110866]
Spam and phishing remain critical threats in cybersecurity, responsible for nearly 90% of security incidents.
As these attacks grow in sophistication, the need for robust defensive mechanisms intensifies.
The emergence of large language models (LLMs) such as ChatGPT presents new challenges.
This work aims to evaluate the robustness and effectiveness of SpamAssassin against LLM-modified email content.
arXiv Detail & Related papers (2024-08-26T14:25:30Z) - Prompted Contextual Vectors for Spear-Phishing Detection [45.07804966535239]
Spear-phishing attacks present a significant security challenge.
We propose a detection approach based on a novel document vectorization method.
Our method achieves a 91% F1 score in identifying LLM-generated spear-phishing emails.
arXiv Detail & Related papers (2024-02-13T09:12:55Z) - MGTBench: Benchmarking Machine-Generated Text Detection [54.81446366272403]
This paper proposes the first benchmark framework for MGT detection against powerful large language models (LLMs)
We show that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples.
Our findings indicate that the model-based detection methods still perform well in the text attribution task.
arXiv Detail & Related papers (2023-03-26T21:12:36Z) - Profiler: Profile-Based Model to Detect Phishing Emails [15.109679047753355]
We propose a multidimensional risk assessment of emails to reduce the feasibility of an attacker adapting their email and avoiding detection.
We develop a risk assessment framework that includes three models which analyse an email's (1) threat level, (2) cognitive manipulation, and (3) email type.
Our Profiler can be used in conjunction with ML approaches, to reduce their misclassifications or as a labeller for large email data sets in the training stage.
arXiv Detail & Related papers (2022-08-18T10:01:55Z) - Anomaly Detection in Emails using Machine Learning and Header
Information [0.0]
Anomalies in emails such as phishing and spam present major security risks.
Previous studies on email anomaly detection relied on a single type of anomaly and the analysis of the email body and subject content.
This study conducted feature extraction and selection on email header datasets and leveraged both multi and one-class anomaly detection approaches.
arXiv Detail & Related papers (2022-03-19T23:31:23Z) - Deep convolutional forest: a dynamic deep ensemble approach for spam
detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically.
As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z) - Robust Spammer Detection by Nash Reinforcement Learning [64.80986064630025]
We develop a minimax game where the spammers and spam detectors compete with each other on their practical goals.
We show that an optimization algorithm can reliably find an equilibrial detector that can robustly prevent spammers with any mixed spamming strategies from attaining their practical goal.
arXiv Detail & Related papers (2020-06-10T21:18:07Z) - Learning with Weak Supervision for Email Intent Detection [56.71599262462638]
We propose to leverage user actions as a source of weak supervision to detect intents in emails.
We develop an end-to-end robust deep neural network model for email intent identification.
arXiv Detail & Related papers (2020-05-26T23:41:05Z) - DeepQuarantine for Suspicious Mail [0.0]
DeepQuarantine (DQ) is a cloud technology to detect and quarantine potential spam messages.
Most of the quarantined mail is spam, which allows clients to use email without delay.
arXiv Detail & Related papers (2020-01-13T11:32:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.