Related papers: A Late Multi-Modal Fusion Model for Detecting Hybrid Spam E-mail

A Late Multi-Modal Fusion Model for Detecting Hybrid Spam E-mail

URL: http://arxiv.org/abs/2210.14616v4
Date: Mon, 15 May 2023 05:49:44 GMT
Title: A Late Multi-Modal Fusion Model for Detecting Hybrid Spam E-mail
Authors: Zhibo Zhang, Ernesto Damiani, Hussam Al Hamadi, Chan Yeob Yeun, Fatma Taher
Abstract summary: A few studies have been conducted with the goal of detecting hybrid spam e-mails. Optical Character Recognition is a very successful technique in processing text-and-image hybrid spam. We propose new late multi-modal fusion training frameworks for a text-and-image hybrid spam e-mail filtering system.
Score: 5.182080825408661
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, spammers are now trying to obfuscate their intents by introducing hybrid spam e-mail combining both image and text parts, which is more challenging to detect in comparison to e-mails containing text or image only. The motivation behind this research is to design an effective approach filtering out hybrid spam e-mails to avoid situations where traditional text-based or image-baesd only filters fail to detect hybrid spam e-mails. To the best of our knowledge, a few studies have been conducted with the goal of detecting hybrid spam e-mails. Ordinarily, Optical Character Recognition (OCR) technology is used to eliminate the image parts of spam by transforming images into text. However, the research questions are that although OCR scanning is a very successful technique in processing text-and-image hybrid spam, it is not an effective solution for dealing with huge quantities due to the CPU power required and the execution time it takes to scan e-mail files. And the OCR techniques are not always reliable in the transformation processes. To address such problems, we propose new late multi-modal fusion training frameworks for a text-and-image hybrid spam e-mail filtering system compared to the classical early fusion detection frameworks based on the OCR method. Convolutional Neural Network (CNN) and Continuous Bag of Words were implemented to extract features from image and text parts of hybrid spam respectively, whereas generated features were fed to sigmoid layer and Machine Learning based classifiers including Random Forest (RF), Decision Tree (DT), Naive Bayes (NB) and Support Vector Machine (SVM) to determine the e-mail ham or spam.

Related papers

LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images? [80.4577892387028]
We introduce LogicOCR, a benchmark comprising 1,100 multiple-choice questions designed to evaluate LMMs' logical reasoning abilities on text-rich images.<n>We develop a scalable, automated pipeline to convert a text corpus into multimodal samples.<n>We evaluate a range of representative open-source and proprietary LMMs under both Chain-of-Thought (CoT) and direct-answer settings.
arXiv Detail & Related papers (2025-05-18T08:39:37Z)
Prompted Contextual Vectors for Spear-Phishing Detection [45.07804966535239]
Spear-phishing attacks present a significant security challenge. We propose a detection approach based on a novel document vectorization method. Our method achieves a 91% F1 score in identifying LLM-generated spear-phishing emails.
arXiv Detail & Related papers (2024-02-13T09:12:55Z)
Perceptual Image Compression with Cooperative Cross-Modal Side Information [53.356714177243745]
We propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff. Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features.
arXiv Detail & Related papers (2023-11-23T08:31:11Z)
DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection [56.513637720967566]
Large language models (LLMs) can generate texts that pose risks of misuse, such as plagiarism, planting fake reviews on e-commerce platforms, or creating inflammatory false tweets. Existing high-quality detection methods usually require access to the interior of the model to extract the intrinsic characteristics. We propose to extract deep intrinsic characteristics of the black-box model generated texts.
arXiv Detail & Related papers (2023-05-21T17:26:16Z)
Unified Multi-Modal Latent Diffusion for Joint Subject and Text Conditional Image Generation [63.061871048769596]
We present a novel Unified Multi-Modal Latent Diffusion (UMM-Diffusion) which takes joint texts and images containing specified subjects as input sequences. To be more specific, both input texts and images are encoded into one unified multi-modal latent space. Our method is able to generate high-quality images with complex semantics from both aspects of input texts and images.
arXiv Detail & Related papers (2023-03-16T13:50:20Z)
Building an Effective Email Spam Classification Model with spaCy [0.0]
Author has used spaCy natural language processing library and 3 machine learning (ML) algorithms Naive Bayes (NB), Decision Tree C45 and Multilayer Perceptron (MLP) in Python programming language to detect spam emails collected from Gmail service.
arXiv Detail & Related papers (2023-03-15T17:41:11Z)
Convolutional Neural Networks for Image Spam Detection [4.817429789586127]
Spam can be defined as unsolicited bulk email. In an effort to evade text-based filters, spammers sometimes embed spam text in an image, which is referred to as image spam. We apply convolutional neural networks (CNN) to this problem, we compare the results obtained using CNNs to other machine learning techniques, and we compare our results to previous related work.
arXiv Detail & Related papers (2022-04-02T15:10:44Z)
Adaptive Shrink-Mask for Text Detection [91.34459257409104]
Existing real-time text detectors reconstruct text contours by shrink-masks directly. The dependence on predicted shrink-masks leads to unstable detection results. Super-pixel Window (SPW) is designed to supervise the network.
arXiv Detail & Related papers (2021-11-18T07:38:57Z)
Deep convolutional forest: a dynamic deep ensemble approach for spam detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically. As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z)
Universal Adversarial Perturbations and Image Spam Classifiers [4.111899441919165]
Image spam is email that has been embedded in an image. Modern deep learning-based classifiers perform well in detecting typical image spam. We propose and analyze a new transformation-based adversarial attack that enables us to create tailored "natural perturbations" in image spam.
arXiv Detail & Related papers (2021-03-07T14:36:02Z)
DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis [80.54273334640285]
We propose a novel one-stage text-to-image backbone that directly synthesizes high-resolution images without entanglements between different generators. We also propose a novel Target-Aware Discriminator composed of Matching-Aware Gradient Penalty and One-Way Output. Compared with current state-of-the-art methods, our proposed DF-GAN is simpler but more efficient to synthesize realistic and text-matching images.
arXiv Detail & Related papers (2020-08-13T12:51:17Z)
DeepCapture: Image Spam Detection Using Deep Learning and Data Augmentation [16.488574089293326]
We propose a new image spam email detection tool called DeepCapture using a convolutional neural network (CNN) model. DeepCapture is capable of achieving an F1-score of 88%, which has a 6% improvement over the best existing spam detection model CNN-SVM.
arXiv Detail & Related papers (2020-06-16T02:50:04Z)
Classification of Spam Emails through Hierarchical Clustering and Supervised Learning [1.8065361710947976]
We propose to classify spam email in categories to improve the handle of already detected spam emails. For the task of multi-class spam classification, the use of TF-IDF combined with SVM for the best micro F1 score performance, $95.39%$, and (ii) TD-IDF along with NB for the fastest spam classification, analyzing an email in $2.13$ms.
arXiv Detail & Related papers (2020-05-18T14:41:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.