Anomaly Detection in Emails using Machine Learning and Header
Information
- URL: http://arxiv.org/abs/2203.10408v1
- Date: Sat, 19 Mar 2022 23:31:23 GMT
- Title: Anomaly Detection in Emails using Machine Learning and Header
Information
- Authors: Craig Beaman and Haruna Isah
- Abstract summary: Anomalies in emails such as phishing and spam present major security risks.
Previous studies on email anomaly detection relied on a single type of anomaly and the analysis of the email body and subject content.
This study conducted feature extraction and selection on email header datasets and leveraged both multi and one-class anomaly detection approaches.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Anomalies in emails such as phishing and spam present major security risks
such as the loss of privacy, money, and brand reputation to both individuals
and organizations. Previous studies on email anomaly detection relied on a
single type of anomaly and the analysis of the email body and subject content.
A drawback of this approach is that it takes into account the written language
of the email content. To overcome this deficit, this study conducted feature
extraction and selection on email header datasets and leveraged both multi and
one-class anomaly detection approaches. Experimental analysis results obtained
demonstrate that email header information only is enough to reliably detect
spam and phishing emails. Supervised learning algorithms such as Random Forest,
SVM, MLP, KNN, and their stacked ensembles were found to be very successful,
achieving high accuracy scores of 97% for phishing and 99% for spam emails.
One-class classification with One-Class SVM achieved accuracy scores of 87% and
89% with spam and phishing emails, respectively. Real-world email filtering
applications will benefit from the use of only the header information in terms
of resources utilization and efficiency.
Related papers
- Constructing and Benchmarking: a Labeled Email Dataset for Text-Based Phishing and Spam Detection Framework [0.37687375904925485]
This study presents a comprehensive email dataset containing phishing, spam, and legitimate messages.<n>Each email is annotated with its category, emotional appeal, authority, and underlying motivation.<n>Results highlight strong phishing detection capabilities but reveal persistent challenges in distinguishing spam from legitimate emails.
arXiv Detail & Related papers (2025-11-26T14:40:06Z) - CLUE: Non-parametric Verification from Experience via Hidden-State Clustering [64.50919789875233]
We show that correctness of a solution is encoded as a geometrically separable signature within the trajectory of hidden activations.<n>ClUE consistently outperforms LLM-as-a-judge baselines and matches or exceeds modern confidence-based methods in reranking candidates.
arXiv Detail & Related papers (2025-10-02T02:14:33Z) - Characterizing Phishing Pages by JavaScript Capabilities [77.64740286751834]
This paper aims to aid researchers and analysts by automatically differentiating groups of phishing pages based on the underlying kit.<n>For kit detection, our system has an accuracy of 97% on a ground-truth dataset of 548 kit families deployed across 4,562 phishing URLs.<n>We find that UI interactivity and basic fingerprinting are universal techniques, present in 90% and 80% of the clusters.
arXiv Detail & Related papers (2025-09-16T15:39:23Z) - E-PhishGen: Unlocking Novel Research in Phishing Email Detection [17.071710380823003]
This "open problem" paper carries out a critical assessment of scientific works in the context of phishing email detection.<n>We show that phishing email detection is still an open problem -- and provide the means to tackle such a problem by future research.
arXiv Detail & Related papers (2025-09-01T21:41:34Z) - Exploring Content Concealment in Email [0.48748194765816943]
Modern email filters, one of our few defence mechanisms against malicious emails, are often circumvented by sophisticated attackers.
This study focuses on how attackers exploit HTML and CSS in emails to conceal arbitrary content.
This concealed content remains undetected by the recipient, presenting a serious security risk.
arXiv Detail & Related papers (2024-10-15T01:12:47Z) - ChatSpamDetector: Leveraging Large Language Models for Effective Phishing Email Detection [2.3999111269325266]
This study introduces ChatSpamDetector, a system that uses large language models (LLMs) to detect phishing emails.
By converting email data into a prompt suitable for LLM analysis, the system provides a highly accurate determination of whether an email is phishing or not.
We conducted an evaluation using a comprehensive phishing email dataset and compared our system to several LLMs and baseline systems.
arXiv Detail & Related papers (2024-02-28T06:28:15Z) - Prompted Contextual Vectors for Spear-Phishing Detection [45.07804966535239]
Spear-phishing attacks present a significant security challenge.
We propose a detection approach based on a novel document vectorization method.
Our method achieves a 91% F1 score in identifying LLM-generated spear-phishing emails.
arXiv Detail & Related papers (2024-02-13T09:12:55Z) - Building an Effective Email Spam Classification Model with spaCy [0.0]
Author has used spaCy natural language processing library and 3 machine learning (ML) algorithms Naive Bayes (NB), Decision Tree C45 and Multilayer Perceptron (MLP) in Python programming language to detect spam emails collected from Gmail service.
arXiv Detail & Related papers (2023-03-15T17:41:11Z) - Profiler: Profile-Based Model to Detect Phishing Emails [15.109679047753355]
We propose a multidimensional risk assessment of emails to reduce the feasibility of an attacker adapting their email and avoiding detection.
We develop a risk assessment framework that includes three models which analyse an email's (1) threat level, (2) cognitive manipulation, and (3) email type.
Our Profiler can be used in conjunction with ML approaches, to reduce their misclassifications or as a labeller for large email data sets in the training stage.
arXiv Detail & Related papers (2022-08-18T10:01:55Z) - Deep convolutional forest: a dynamic deep ensemble approach for spam
detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically.
As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z) - Phishing Detection through Email Embeddings [2.099922236065961]
The problem of detecting phishing emails through machine learning techniques has been discussed extensively in the literature.
In this paper, we crafted a set of phishing and legitimate emails with similar indicators in order to investigate whether these cues are captured or disregarded by email embeddings.
Our results show that using these indicators, email embeddings techniques is effective for classifying emails as phishing or legitimate.
arXiv Detail & Related papers (2020-12-28T21:16:41Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z) - Active Learning from Crowd in Document Screening [76.9545252341746]
We focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently.
We propose a multi-label active learning screening specific sampling technique -- objective-aware sampling.
We demonstrate that objective-aware sampling significantly outperforms the state of the art active learning sampling strategies.
arXiv Detail & Related papers (2020-11-11T16:17:28Z) - Robust and Verifiable Information Embedding Attacks to Deep Neural
Networks via Error-Correcting Codes [81.85509264573948]
In the era of deep learning, a user often leverages a third-party machine learning tool to train a deep neural network (DNN) classifier.
In an information embedding attack, an attacker is the provider of a malicious third-party machine learning tool.
In this work, we aim to design information embedding attacks that are verifiable and robust against popular post-processing methods.
arXiv Detail & Related papers (2020-10-26T17:42:42Z) - Learning with Weak Supervision for Email Intent Detection [56.71599262462638]
We propose to leverage user actions as a source of weak supervision to detect intents in emails.
We develop an end-to-end robust deep neural network model for email intent identification.
arXiv Detail & Related papers (2020-05-26T23:41:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.