DeepCapture: Image Spam Detection Using Deep Learning and Data
Augmentation
- URL: http://arxiv.org/abs/2006.08885v1
- Date: Tue, 16 Jun 2020 02:50:04 GMT
- Title: DeepCapture: Image Spam Detection Using Deep Learning and Data
Augmentation
- Authors: Bedeuro Kim, Sharif Abuadbba, Hyoungshick Kim
- Abstract summary: We propose a new image spam email detection tool called DeepCapture using a convolutional neural network (CNN) model.
DeepCapture is capable of achieving an F1-score of 88%, which has a 6% improvement over the best existing spam detection model CNN-SVM.
- Score: 16.488574089293326
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image spam emails are often used to evade text-based spam filters that detect
spam emails with their frequently used keywords. In this paper, we propose a
new image spam email detection tool called DeepCapture using a convolutional
neural network (CNN) model. There have been many efforts to detect image spam
emails, but there is a significant performance degrade against entirely new and
unseen image spam emails due to overfitting during the training phase. To
address this challenging issue, we mainly focus on developing a more robust
model to address the overfitting problem. Our key idea is to build a
CNN-XGBoost framework consisting of eight layers only with a large number of
training samples using data augmentation techniques tailored towards the image
spam detection task. To show the feasibility of DeepCapture, we evaluate its
performance with publicly available datasets consisting of 6,000 spam and 2,313
non-spam image samples. The experimental results show that DeepCapture is
capable of achieving an F1-score of 88%, which has a 6% improvement over the
best existing spam detection model CNN-SVM with an F1-score of 82%. Moreover,
DeepCapture outperformed existing image spam detection solutions against new
and unseen image datasets.
Related papers
- Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning [53.766434746801366]
Multimodal contrastive learning (MCL) has shown remarkable advances in zero-shot classification by learning from millions of image-caption pairs crawled from the Internet.
Hackers may unauthorizedly exploit image-text data for model training, potentially including personal and privacy-sensitive information.
Recent works propose generating unlearnable examples by adding imperceptible perturbations to training images to build shortcuts for protection.
We propose Multi-step Error Minimization (MEM), a novel optimization process for generating multimodal unlearnable examples.
arXiv Detail & Related papers (2024-07-23T09:00:52Z) - A Late Multi-Modal Fusion Model for Detecting Hybrid Spam E-mail [5.182080825408661]
A few studies have been conducted with the goal of detecting hybrid spam e-mails.
Optical Character Recognition is a very successful technique in processing text-and-image hybrid spam.
We propose new late multi-modal fusion training frameworks for a text-and-image hybrid spam e-mail filtering system.
arXiv Detail & Related papers (2022-10-26T10:47:12Z) - Convolutional Neural Networks for Image Spam Detection [4.817429789586127]
Spam can be defined as unsolicited bulk email.
In an effort to evade text-based filters, spammers sometimes embed spam text in an image, which is referred to as image spam.
We apply convolutional neural networks (CNN) to this problem, we compare the results obtained using CNNs to other machine learning techniques, and we compare our results to previous related work.
arXiv Detail & Related papers (2022-04-02T15:10:44Z) - Core Risk Minimization using Salient ImageNet [53.616101711801484]
We introduce the Salient Imagenet dataset with more than 1 million soft masks localizing core and spurious features for all 1000 Imagenet classes.
Using this dataset, we first evaluate the reliance of several Imagenet pretrained models (42 total) on spurious features.
Next, we introduce a new learning paradigm called Core Risk Minimization (CoRM) whose objective ensures that the model predicts a class using its core features.
arXiv Detail & Related papers (2022-03-28T01:53:34Z) - Anomaly Detection in Emails using Machine Learning and Header
Information [0.0]
Anomalies in emails such as phishing and spam present major security risks.
Previous studies on email anomaly detection relied on a single type of anomaly and the analysis of the email body and subject content.
This study conducted feature extraction and selection on email header datasets and leveraged both multi and one-class anomaly detection approaches.
arXiv Detail & Related papers (2022-03-19T23:31:23Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - Deep convolutional forest: a dynamic deep ensemble approach for spam
detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically.
As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z) - DeepDarts: Modeling Keypoints as Objects for Automatic Scorekeeping in
Darts using a Single Camera [75.34178733070547]
Existing multi-camera solutions for automatic scorekeeping in steel-tip darts are very expensive and thus inaccessible to most players.
We present a new approach to keypoint detection and apply it to predict dart scores from a single image taken from any camera angle.
We develop a deep convolutional neural network around this idea and use it to predict dart locations and dartboard calibration points.
arXiv Detail & Related papers (2021-05-20T16:25:57Z) - Universal Adversarial Perturbations and Image Spam Classifiers [4.111899441919165]
Image spam is email that has been embedded in an image.
Modern deep learning-based classifiers perform well in detecting typical image spam.
We propose and analyze a new transformation-based adversarial attack that enables us to create tailored "natural perturbations" in image spam.
arXiv Detail & Related papers (2021-03-07T14:36:02Z) - Robust Spammer Detection by Nash Reinforcement Learning [64.80986064630025]
We develop a minimax game where the spammers and spam detectors compete with each other on their practical goals.
We show that an optimization algorithm can reliably find an equilibrial detector that can robustly prevent spammers with any mixed spamming strategies from attaining their practical goal.
arXiv Detail & Related papers (2020-06-10T21:18:07Z) - DeepQuarantine for Suspicious Mail [0.0]
DeepQuarantine (DQ) is a cloud technology to detect and quarantine potential spam messages.
Most of the quarantined mail is spam, which allows clients to use email without delay.
arXiv Detail & Related papers (2020-01-13T11:32:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.