A Late Multi-Modal Fusion Model for Detecting Hybrid Spam E-mail
- URL: http://arxiv.org/abs/2210.14616v4
- Date: Mon, 15 May 2023 05:49:44 GMT
- Title: A Late Multi-Modal Fusion Model for Detecting Hybrid Spam E-mail
- Authors: Zhibo Zhang, Ernesto Damiani, Hussam Al Hamadi, Chan Yeob Yeun, Fatma
Taher
- Abstract summary: A few studies have been conducted with the goal of detecting hybrid spam e-mails.
Optical Character Recognition is a very successful technique in processing text-and-image hybrid spam.
We propose new late multi-modal fusion training frameworks for a text-and-image hybrid spam e-mail filtering system.
- Score: 5.182080825408661
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, spammers are now trying to obfuscate their intents by
introducing hybrid spam e-mail combining both image and text parts, which is
more challenging to detect in comparison to e-mails containing text or image
only. The motivation behind this research is to design an effective approach
filtering out hybrid spam e-mails to avoid situations where traditional
text-based or image-baesd only filters fail to detect hybrid spam e-mails. To
the best of our knowledge, a few studies have been conducted with the goal of
detecting hybrid spam e-mails. Ordinarily, Optical Character Recognition (OCR)
technology is used to eliminate the image parts of spam by transforming images
into text. However, the research questions are that although OCR scanning is a
very successful technique in processing text-and-image hybrid spam, it is not
an effective solution for dealing with huge quantities due to the CPU power
required and the execution time it takes to scan e-mail files. And the OCR
techniques are not always reliable in the transformation processes. To address
such problems, we propose new late multi-modal fusion training frameworks for a
text-and-image hybrid spam e-mail filtering system compared to the classical
early fusion detection frameworks based on the OCR method. Convolutional Neural
Network (CNN) and Continuous Bag of Words were implemented to extract features
from image and text parts of hybrid spam respectively, whereas generated
features were fed to sigmoid layer and Machine Learning based classifiers
including Random Forest (RF), Decision Tree (DT), Naive Bayes (NB) and Support
Vector Machine (SVM) to determine the e-mail ham or spam.
Related papers
- Prompted Contextual Vectors for Spear-Phishing Detection [45.07804966535239]
Spear-phishing attacks present a significant security challenge.
We propose a detection approach based on a novel document vectorization method.
Our method achieves a 91% F1 score in identifying LLM-generated spear-phishing emails.
arXiv Detail & Related papers (2024-02-13T09:12:55Z) - DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection [56.513637720967566]
Large language models (LLMs) can generate texts that pose risks of misuse, such as plagiarism, planting fake reviews on e-commerce platforms, or creating inflammatory false tweets.
Existing high-quality detection methods usually require access to the interior of the model to extract the intrinsic characteristics.
We propose to extract deep intrinsic characteristics of the black-box model generated texts.
arXiv Detail & Related papers (2023-05-21T17:26:16Z) - Unified Multi-Modal Latent Diffusion for Joint Subject and Text
Conditional Image Generation [63.061871048769596]
We present a novel Unified Multi-Modal Latent Diffusion (UMM-Diffusion) which takes joint texts and images containing specified subjects as input sequences.
To be more specific, both input texts and images are encoded into one unified multi-modal latent space.
Our method is able to generate high-quality images with complex semantics from both aspects of input texts and images.
arXiv Detail & Related papers (2023-03-16T13:50:20Z) - Building an Effective Email Spam Classification Model with spaCy [0.0]
Author has used spaCy natural language processing library and 3 machine learning (ML) algorithms Naive Bayes (NB), Decision Tree C45 and Multilayer Perceptron (MLP) in Python programming language to detect spam emails collected from Gmail service.
arXiv Detail & Related papers (2023-03-15T17:41:11Z) - Convolutional Neural Networks for Image Spam Detection [4.817429789586127]
Spam can be defined as unsolicited bulk email.
In an effort to evade text-based filters, spammers sometimes embed spam text in an image, which is referred to as image spam.
We apply convolutional neural networks (CNN) to this problem, we compare the results obtained using CNNs to other machine learning techniques, and we compare our results to previous related work.
arXiv Detail & Related papers (2022-04-02T15:10:44Z) - Adaptive Shrink-Mask for Text Detection [91.34459257409104]
Existing real-time text detectors reconstruct text contours by shrink-masks directly.
The dependence on predicted shrink-masks leads to unstable detection results.
Super-pixel Window (SPW) is designed to supervise the network.
arXiv Detail & Related papers (2021-11-18T07:38:57Z) - Deep convolutional forest: a dynamic deep ensemble approach for spam
detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically.
As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z) - Universal Adversarial Perturbations and Image Spam Classifiers [4.111899441919165]
Image spam is email that has been embedded in an image.
Modern deep learning-based classifiers perform well in detecting typical image spam.
We propose and analyze a new transformation-based adversarial attack that enables us to create tailored "natural perturbations" in image spam.
arXiv Detail & Related papers (2021-03-07T14:36:02Z) - DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis [80.54273334640285]
We propose a novel one-stage text-to-image backbone that directly synthesizes high-resolution images without entanglements between different generators.
We also propose a novel Target-Aware Discriminator composed of Matching-Aware Gradient Penalty and One-Way Output.
Compared with current state-of-the-art methods, our proposed DF-GAN is simpler but more efficient to synthesize realistic and text-matching images.
arXiv Detail & Related papers (2020-08-13T12:51:17Z) - DeepCapture: Image Spam Detection Using Deep Learning and Data
Augmentation [16.488574089293326]
We propose a new image spam email detection tool called DeepCapture using a convolutional neural network (CNN) model.
DeepCapture is capable of achieving an F1-score of 88%, which has a 6% improvement over the best existing spam detection model CNN-SVM.
arXiv Detail & Related papers (2020-06-16T02:50:04Z) - Classification of Spam Emails through Hierarchical Clustering and
Supervised Learning [1.8065361710947976]
We propose to classify spam email in categories to improve the handle of already detected spam emails.
For the task of multi-class spam classification, the use of TF-IDF combined with SVM for the best micro F1 score performance, $95.39%$, and (ii) TD-IDF along with NB for the fastest spam classification, analyzing an email in $2.13$ms.
arXiv Detail & Related papers (2020-05-18T14:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.