Related papers: Improving Phishing Email Detection Performance of Small Large Language Models

Improving Phishing Email Detection Performance of Small Large Language Models

URL: http://arxiv.org/abs/2505.00034v1
Date: Tue, 29 Apr 2025 14:07:06 GMT
Title: Improving Phishing Email Detection Performance of Small Large Language Models
Authors: Zijie Lin, Zikang Liu, Hanbo Fan,
Abstract summary: Large language models(LLMs) have demonstrated remarkable performance on many natural language processing(NLP) tasks.<n>However, well-performing LLMs typically contain billions or even tens of billions of parameters, requiring enormous computational resources.
Score: 5.209583971923267
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models(LLMs) have demonstrated remarkable performance on many natural language processing(NLP) tasks and have been employed in phishing email detection research. However, in current studies, well-performing LLMs typically contain billions or even tens of billions of parameters, requiring enormous computational resources. To reduce computational costs, we investigated the effectiveness of small-parameter LLMs for phishing email detection. These LLMs have around 3 billion parameters and can run on consumer-grade GPUs. However, small LLMs often perform poorly in phishing email detection task. To address these issues, we designed a set of methods including Prompt Engineering, Explanation Augmented Fine-tuning, and Model Ensemble to improve phishing email detection capabilities of small LLMs. We validated the effectiveness of our approach through experiments, significantly improving accuracy on the SpamAssassin dataset from around 0.5 for baseline models like Qwen2.5-1.5B-Instruct to 0.976.

Related papers

Enhancing Phishing Email Identification with Large Language Models [0.40792653193642503]
We study the efficacy of large language models (LLMs) in detecting phishing emails. Experiments show that the LLM achieves a high accuracy rate at high precision.
arXiv Detail & Related papers (2025-02-07T08:45:50Z)
LLM2: Let Large Language Models Harness System 2 Reasoning [65.89293674479907]
Large language models (LLMs) have exhibited impressive capabilities across a myriad of tasks, yet they occasionally yield undesirable outputs. We introduce LLM2, a novel framework that combines an LLM with a process-based verifier. LLMs2 is responsible for generating plausible candidates, while the verifier provides timely process-based feedback to distinguish desirable and undesirable outputs.
arXiv Detail & Related papers (2024-12-29T06:32:36Z)
A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Method-Level Code Smell Detection [11.9757082688031]
Existing detection methods, relying on Codes or Machine Learning (ML) and Deep Learning (DL) techniques, often face limitations such as unsatisfactory performance.<n>This study evaluates state-of-the-art PEFT methods on both small and large Language Models for detecting two types of method-level code smells: Complex Conditional and Complex Method.<n>Results show that PEFT methods achieve comparable or better performance than full fine-tuning while consuming less GPU memory.
arXiv Detail & Related papers (2024-12-18T12:48:36Z)
Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models [79.46938238953916]
Fine-tuning large language models (LLMs) to diverse applications is crucial to meet complex demands. Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs. In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs.
arXiv Detail & Related papers (2024-06-13T07:57:27Z)
Get my drift? Catching LLM Task Drift with Activation Deltas [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.<n>We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.<n>We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z)
Lateral Phishing With Large Language Models: A Large Organization Comparative Study [3.590574657417729]
The emergence of Large Language Models (LLMs) has heightened the threat of phishing emails by enabling the generation of highly targeted, personalized, and automated attacks.<n>There is a lack of large-scale studies comparing the effectiveness of LLM-generated lateral phishing emails to those crafted by humans.<n>This study contributes to the understanding of cyber security threats in educational institutions.
arXiv Detail & Related papers (2024-01-18T05:06:39Z)
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes [53.4856038354195]
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions. FedKSeed employs zeroth-order optimization with a finite set of random seeds. It significantly reduces transmission requirements between the server and clients to just a few random seeds.
arXiv Detail & Related papers (2023-12-11T13:03:21Z)
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models [79.34513906324727]
In this paper, we aim at parameter and efficient transfer learning (PCETL) for vision-language pre-trained models. We propose a novel dynamic architecture skipping (DAS) approach towards effective PCETL.
arXiv Detail & Related papers (2023-09-04T09:34:33Z)
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models [11.845239346943067]
parameter-efficient fine-tuning (PEFT) is a promising approach to efficiently specialize large language models (LLMs) to task-specific data.<n>Our study highlights the potential for tuning larger LLMs and significant reductions in memory usage by combining PEFT with quantization.
arXiv Detail & Related papers (2023-08-21T04:31:06Z)
Detecting Phishing Sites Using ChatGPT [2.3999111269325266]
We propose a novel system called ChatPhishDetector that utilizes Large Language Models (LLMs) to detect phishing sites. Our system involves leveraging a web crawler to gather information from websites, generating prompts for LLMs based on the crawled data, and then retrieving the detection results from the responses generated by the LLMs. The experimental results using GPT-4V demonstrated outstanding performance, with a precision of 98.7% and a recall of 99.6%, outperforming the detection results of other LLMs and existing systems.
arXiv Detail & Related papers (2023-06-09T11:30:08Z)
Spear Phishing With Large Language Models [3.2634122554914002]
This study explores how large language models (LLMs) can be used for spear phishing. I create unique spear phishing messages for over 600 British Members of Parliament using OpenAI's GPT-3.5 and GPT-4 models. My findings provide some evidence that these messages are not only realistic but also cost-effective, with each email costing only a fraction of a cent to generate.
arXiv Detail & Related papers (2023-05-11T16:55:19Z)
Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection [3.3504365823045044]
This paper investigates the effectiveness of large language models (LLMs) in email spam detection. We compare prominent models from three distinct families: BERT-like, Sentence Transformers, and Seq2Seq. We assess the performance of these models across four public datasets.
arXiv Detail & Related papers (2023-04-03T10:27:53Z)
CPM-2: Large-scale Cost-effective Pre-trained Language Models [71.59893315671997]
We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference. We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch. We implement a new inference toolkit, namely InfMoE, for using large-scale PLMs with limited computational resources.
arXiv Detail & Related papers (2021-06-20T15:43:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.