Related papers: An Investigation of Large Language Models and Their Vulnerabilities in Spam Detection

An Investigation of Large Language Models and Their Vulnerabilities in Spam Detection

URL: http://arxiv.org/abs/2504.09776v1
Date: Mon, 14 Apr 2025 00:30:27 GMT
Title: An Investigation of Large Language Models and Their Vulnerabilities in Spam Detection
Authors: Qiyao Tang, Xiangyang Li,
Abstract summary: This project studies new spam detection systems that leverage Large Language Models (LLMs) fine-tuned with spam datasets.<n>This experimentation employs two LLM models of GPT2 and BERT and three spam datasets of Enron, LingSpam, and SMSspamCollection.<n>The results show that, while they can function as effective spam filters, the LLM models are susceptible to the adversarial and data poisoning attacks.
Score: 7.550686419077825
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Spam messages continue to present significant challenges to digital users, cluttering inboxes and posing security risks. Traditional spam detection methods, including rules-based, collaborative, and machine learning approaches, struggle to keep up with the rapidly evolving tactics employed by spammers. This project studies new spam detection systems that leverage Large Language Models (LLMs) fine-tuned with spam datasets. More importantly, we want to understand how LLM-based spam detection systems perform under adversarial attacks that purposefully modify spam emails and data poisoning attacks that exploit the differences between the training data and the massages in detection, to which traditional machine learning models are shown to be vulnerable. This experimentation employs two LLM models of GPT2 and BERT and three spam datasets of Enron, LingSpam, and SMSspamCollection for extensive training and testing tasks. The results show that, while they can function as effective spam filters, the LLM models are susceptible to the adversarial and data poisoning attacks. This research provides very useful insights for future applications of LLM models for information security.

Related papers

SpaLLM-Guard: Pairing SMS Spam Detection Using Open-source and Commercial LLMs [1.3198171962008958]
We evaluate the potential of large language models (LLMs), both open-source and commercial, for SMS spam detection.<n>We compare their performance across zero-shot, few-shot, fine-tuning, and chain-of-thought prompting approaches.<n>Fine-tuning emerges as the most effective strategy, with Mixtral achieving 98.6% accuracy and a balanced false positive and false negative rate below 2%.
arXiv Detail & Related papers (2025-01-09T06:00:08Z)
Next-Generation Phishing: How LLM Agents Empower Cyber Attackers [10.067883724547182]
The escalating threat of phishing emails has become increasingly sophisticated with the rise of Large Language Models (LLMs) As attackers exploit LLMs to craft more convincing and evasive phishing emails, it is crucial to assess the resilience of current phishing defenses. We conduct a comprehensive evaluation of traditional phishing detectors, such as Gmail Spam Filter, Apache SpamAssassin, and Proofpoint, as well as machine learning models like SVM, Logistic Regression, and Naive Bayes. Our results reveal notable declines in detection accuracy for rephrased emails across all detectors, highlighting critical weaknesses in current phishing defenses.
arXiv Detail & Related papers (2024-11-21T06:20:29Z)
Attention Tracker: Detecting Prompt Injection Attacks in LLMs [62.247841717696765]
Large Language Models (LLMs) have revolutionized various domains but remain vulnerable to prompt injection attacks. We introduce the concept of the distraction effect, where specific attention heads shift focus from the original instruction to the injected instruction. We propose Attention Tracker, a training-free detection method that tracks attention patterns on instruction to detect prompt injection attacks.
arXiv Detail & Related papers (2024-11-01T04:05:59Z)
Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement [51.601916604301685]
Large language models (LLMs) generate content that can undermine trust in online discourse. Current methods often focus on binary classification, failing to address the complexities of real-world scenarios like human-LLM collaboration. To move beyond binary classification and address these challenges, we propose a new paradigm for detecting LLM-generated content.
arXiv Detail & Related papers (2024-10-18T08:14:10Z)
Evaluating LLM-based Personal Information Extraction and Countermeasures [63.91918057570824]
Large language model (LLM) based personal information extraction can be benchmarked.<n>LLM can be misused by attackers to accurately extract various personal information from personal profiles.<n> prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.
arXiv Detail & Related papers (2024-08-14T04:49:30Z)
SpamDam: Towards Privacy-Preserving and Adversary-Resistant SMS Spam Detection [2.0355793807035094]
SpamDam is a SMS spam detection framework designed to overcome key challenges in detecting and understanding SMS spam. We have compiled over 76K SMS spam messages from Twitter and Weibo between 2018 and 2023, forming the largest dataset of its kind. We have rigorously tested the adversarial robustness of SMS spam detection models, introducing the novel reverse backdoor attack.
arXiv Detail & Related papers (2024-04-15T06:07:10Z)
What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection [48.572932773403274]
We investigate the opportunities and risks of large language models in social bot detection. We propose a mixture-of-heterogeneous-experts framework to divide and conquer diverse user information modalities. Experiments show that instruction tuning on 1,000 annotated examples produces specialized LLMs that outperform state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-01T06:21:19Z)
LM-Polygraph: Uncertainty Estimation for Language Models [71.21409522341482]
Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of large language models (LLMs) We introduce LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python. It introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores.
arXiv Detail & Related papers (2023-11-13T15:08:59Z)
An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach [0.0]
We present IPSDM, our model based on fine-tuning the BERT family of models to specifically detect phishing and spam email. We demonstrate our fine-tuned version, IPSDM, is able to better classify emails in both unbalanced and balanced datasets.
arXiv Detail & Related papers (2023-11-01T18:41:50Z)
A Survey on Detection of LLMs-Generated Content [97.87912800179531]
The ability to detect LLMs-generated content has become of paramount importance. We aim to provide a detailed overview of existing detection strategies and benchmarks. We also posit the necessity for a multi-faceted approach to defend against various attacks.
arXiv Detail & Related papers (2023-10-24T09:10:26Z)
Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection [3.3504365823045044]
This paper investigates the effectiveness of large language models (LLMs) in email spam detection. We compare prominent models from three distinct families: BERT-like, Sentence Transformers, and Seq2Seq. We assess the performance of these models across four public datasets.
arXiv Detail & Related papers (2023-04-03T10:27:53Z)
Deep convolutional forest: a dynamic deep ensemble approach for spam detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically. As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.