Related papers: An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach

An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach

URL: http://arxiv.org/abs/2311.04913v2
Date: Sun, 12 Nov 2023 16:32:16 GMT
Title: An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach
Authors: Suhaima Jamal and Hayden Wimmer
Abstract summary: We present IPSDM, our model based on fine-tuning the BERT family of models to specifically detect phishing and spam email. We demonstrate our fine-tuned version, IPSDM, is able to better classify emails in both unbalanced and balanced datasets.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Phishing and spam detection is long standing challenge that has been the subject of much academic research. Large Language Models (LLM) have vast potential to transform society and provide new and innovative approaches to solve well-established challenges. Phishing and spam have caused financial hardships and lost time and resources to email users all over the world and frequently serve as an entry point for ransomware threat actors. While detection approaches exist, especially heuristic-based approaches, LLMs offer the potential to venture into a new unexplored area for understanding and solving this challenge. LLMs have rapidly altered the landscape from business, consumers, and throughout academia and demonstrate transformational potential for the potential of society. Based on this, applying these new and innovative approaches to email detection is a rational next step in academic research. In this work, we present IPSDM, our model based on fine-tuning the BERT family of models to specifically detect phishing and spam email. We demonstrate our fine-tuned version, IPSDM, is able to better classify emails in both unbalanced and balanced datasets. This work serves as an important first step towards employing LLMs to improve the security of our information systems.

Related papers

A Trustworthy Multi-LLM Network: Challenges,Solutions, and A Use Case [59.58213261128626]
We propose a blockchain-enabled collaborative framework that connects multiple Large Language Models (LLMs) into a Trustworthy Multi-LLM Network (MultiLLMN)<n>This architecture enables the cooperative evaluation and selection of the most reliable and high-quality responses to complex network optimization problems.
arXiv Detail & Related papers (2025-05-06T05:32:46Z)
Advancing Email Spam Detection: Leveraging Zero-Shot Learning and Large Language Models [0.0]
This study investigates the effectiveness of Zero-Shot Learning using FLAN-T5 and advanced Natural Language Processing (NLP) techniques such as BERT for email spam detection.<n>The proposed approach aims to address the limitations of traditional spam detection systems.<n>The integration of FLAN-T5 and BERT enables robust spam detection without relying on extensive labeled datasets or frequent retraining.
arXiv Detail & Related papers (2025-05-05T04:48:20Z)
An Investigation of Large Language Models and Their Vulnerabilities in Spam Detection [7.550686419077825]
This project studies new spam detection systems that leverage Large Language Models (LLMs) fine-tuned with spam datasets. This experimentation employs two LLM models of GPT2 and BERT and three spam datasets of Enron, LingSpam, and SMSspamCollection. The results show that, while they can function as effective spam filters, the LLM models are susceptible to the adversarial and data poisoning attacks.
arXiv Detail & Related papers (2025-04-14T00:30:27Z)
LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z)
Adversarial Reasoning at Jailbreaking Time [49.70772424278124]
We develop an adversarial reasoning approach to automatic jailbreaking via test-time computation. Our approach introduces a new paradigm in understanding LLM vulnerabilities, laying the foundation for the development of more robust and trustworthy AI systems.
arXiv Detail & Related papers (2025-02-03T18:59:01Z)
Exposing LLM Vulnerabilities: Adversarial Scam Detection and Performance [16.9071617169937]
This paper investigates the vulnerabilities of Large Language Models (LLMs) when facing adversarial scam messages for the task of scam detection. We created a comprehensive dataset with fine-grained labels of scam messages, including both original and adversarial scam messages. Our analysis showed how adversarial examples took advantage of vulnerabilities of a LLM, leading to high misclassification rate.
arXiv Detail & Related papers (2024-12-01T00:13:28Z)
Next-Generation Phishing: How LLM Agents Empower Cyber Attackers [10.067883724547182]
The escalating threat of phishing emails has become increasingly sophisticated with the rise of Large Language Models (LLMs) As attackers exploit LLMs to craft more convincing and evasive phishing emails, it is crucial to assess the resilience of current phishing defenses. We conduct a comprehensive evaluation of traditional phishing detectors, such as Gmail Spam Filter, Apache SpamAssassin, and Proofpoint, as well as machine learning models like SVM, Logistic Regression, and Naive Bayes. Our results reveal notable declines in detection accuracy for rephrased emails across all detectors, highlighting critical weaknesses in current phishing defenses.
arXiv Detail & Related papers (2024-11-21T06:20:29Z)
Combating Phone Scams with LLM-based Detection: Where Do We Stand? [1.8979188847659796]
This research explores the potential of large language models (LLMs) to provide detection of fraudulent phone calls. LLMs-based detectors can identify potential scams as they occur, offering immediate protection to users.
arXiv Detail & Related papers (2024-09-18T02:14:30Z)
A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends [78.3201480023907]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. The vulnerability of LVLMs is relatively underexplored, posing potential security risks in daily usage. In this paper, we provide a comprehensive review of the various forms of existing LVLM attacks.
arXiv Detail & Related papers (2024-07-10T06:57:58Z)
Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study [51.19622266249408]
MultiTrust is the first comprehensive and unified benchmark on the trustworthiness of MLLMs. Our benchmark employs a rigorous evaluation strategy that addresses both multimodal risks and cross-modal impacts. Extensive experiments with 21 modern MLLMs reveal some previously unexplored trustworthiness issues and risks.
arXiv Detail & Related papers (2024-06-11T08:38:13Z)
An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach [2.8282906214258805]
Phishing email is a serious cyber threat that tries to deceive users by sending false emails with the intention of stealing confidential information or causing financial harm. Despite extensive academic research, phishing detection remains an ongoing and formidable challenge in the cybersecurity landscape. We present an optimized, fine-tuned transformer-based DistilBERT model designed for the detection of phishing emails.
arXiv Detail & Related papers (2024-02-21T15:23:21Z)
Detecting Scams Using Large Language Models [19.7220607313348]
Large Language Models (LLMs) have gained prominence in various applications, including security. This paper explores the utility of LLMs in scam detection, a critical aspect of cybersecurity. We propose a novel use case for LLMs to identify scams, such as phishing, advance fee fraud, and romance scams.
arXiv Detail & Related papers (2024-02-05T16:13:54Z)
A Survey on Detection of LLMs-Generated Content [97.87912800179531]
The ability to detect LLMs-generated content has become of paramount importance. We aim to provide a detailed overview of existing detection strategies and benchmarks. We also posit the necessity for a multi-faceted approach to defend against various attacks.
arXiv Detail & Related papers (2023-10-24T09:10:26Z)
Privacy in Large Language Models: Attacks, Defenses and Future Directions [84.73301039987128]
We analyze the current privacy attacks targeting large language models (LLMs) and categorize them according to the adversary's assumed capabilities. We present a detailed overview of prominent defense strategies that have been developed to counter these privacy attacks.
arXiv Detail & Related papers (2023-10-16T13:23:54Z)
Factuality Challenges in the Era of Large Language Models [113.3282633305118]
Large Language Models (LLMs) generate false, erroneous, or misleading content. LLMs can be exploited for malicious applications. This poses a significant challenge to society in terms of the potential deception of users.
arXiv Detail & Related papers (2023-10-08T14:55:02Z)
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z)
Spear Phishing With Large Language Models [3.2634122554914002]
This study explores how large language models (LLMs) can be used for spear phishing. I create unique spear phishing messages for over 600 British Members of Parliament using OpenAI's GPT-3.5 and GPT-4 models. My findings provide some evidence that these messages are not only realistic but also cost-effective, with each email costing only a fraction of a cent to generate.
arXiv Detail & Related papers (2023-05-11T16:55:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.