Small Language Models for Phishing Website Detection: Cost, Performance, and Privacy Trade-Offs
- URL: http://arxiv.org/abs/2511.15434v1
- Date: Wed, 19 Nov 2025 13:45:07 GMT
- Title: Small Language Models for Phishing Website Detection: Cost, Performance, and Privacy Trade-Offs
- Authors: Georg Goldenits, Philip Koenig, Sebastian Raubitzek, Andreas Ekelhart,
- Abstract summary: Phishing websites pose a major cybersecurity threat, exploiting unsuspecting users and causing significant financial and organisational harm.<n>Traditional machine learning approaches for phishing detection often require extensive feature engineering, continuous retraining, and costly infrastructure maintenance.<n>This paper investigates the feasibility of small language models (SLMs) for detecting phishing websites using only their raw HTML code.
- Score: 0.6299766708197881
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Phishing websites pose a major cybersecurity threat, exploiting unsuspecting users and causing significant financial and organisational harm. Traditional machine learning approaches for phishing detection often require extensive feature engineering, continuous retraining, and costly infrastructure maintenance. At the same time, proprietary large language models (LLMs) have demonstrated strong performance in phishing-related classification tasks, but their operational costs and reliance on external providers limit their practical adoption in many business environments. This paper investigates the feasibility of small language models (SLMs) for detecting phishing websites using only their raw HTML code. A key advantage of these models is that they can be deployed on local infrastructure, providing organisations with greater control over data and operations. We systematically evaluate 15 commonly used Small Language Models (SLMs), ranging from 1 billion to 70 billion parameters, benchmarking their classification accuracy, computational requirements, and cost-efficiency. Our results highlight the trade-offs between detection performance and resource consumption, demonstrating that while SLMs underperform compared to state-of-the-art proprietary LLMs, they can still provide a viable and scalable alternative to external LLM services. By presenting a comparative analysis of costs and benefits, this work lays the foundation for future research on the adaptation, fine-tuning, and deployment of SLMs in phishing detection systems, aiming to balance security effectiveness and economic practicality.
Related papers
- Reinforcement Learning of Large Language Models for Interpretable Credit Card Fraud Detection [29.14690532256978]
This paper proposes a novel approach that employs Reinforcement Learning (RL) to post-train lightweight language models for fraud detection tasks.<n>We utilize the Group Sequence Policy Optimization (GSPO) algorithm combined with a rule-based reward system to fine-tune language models of various sizes on a real-life transaction dataset.<n>Our experimental results demonstrate the effectiveness of this approach, with post-trained language models achieving substantial F1-score improvements on held-out test data.
arXiv Detail & Related papers (2026-01-09T06:56:27Z) - On the Effectiveness of Instruction-Tuning Local LLMs for Identifying Software Vulnerabilities [0.7136933021609079]
Large Language Models (LLMs) show significant promise in automating software vulnerability analysis.<n>Current approaches in using LLMs to automate vulnerability analysis mostly rely on using online API-based LLM services.<n>This paper addresses these limitations by reformulating the problem as Software Vulnerability Identification (SVI)<n>We show that instruct-tuned local models represent a more effective, secure, and practical approach for leveraging LLMs in real-world vulnerability management.
arXiv Detail & Related papers (2025-12-23T05:30:53Z) - Reliable LLM-Based Edge-Cloud-Expert Cascades for Telecom Knowledge Systems [54.916243942641444]
Large language models (LLMs) are emerging as key enablers of automation in domains such as telecommunications.<n>We study an edge-cloud-expert cascaded LLM-based knowledge system that supports decision-making through a question-and-answer pipeline.
arXiv Detail & Related papers (2025-12-23T03:10:09Z) - Towards Trustworthy Lexical Simplification: Exploring Safety and Efficiency with Small LLMs [3.0708839100887833]
Large language models (LLMs) face challenges in real-world application of lexical simplification.<n>vulnerable user groups (e.g., people with disabilities) are one of the key target groups of this technology.<n>We propose an efficient framework for LS systems that utilizes small LLMs deployable in local environments.
arXiv Detail & Related papers (2025-09-29T17:25:56Z) - A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives [65.3369988566853]
Recent studies have demonstrated that adversaries can replicate a target model's functionality.<n>Model Extraction Attacks pose threats to intellectual property, privacy, and system security.<n>We propose a novel taxonomy that classifies MEAs according to attack mechanisms, defense approaches, and computing environments.
arXiv Detail & Related papers (2025-08-20T19:49:59Z) - Phishing Detection in the Gen-AI Era: Quantized LLMs vs Classical Models [1.4999444543328293]
Phishing attacks are becoming increasingly sophisticated, underscoring the need for detection systems that strike a balance between high accuracy and computational efficiency.<n>This paper presents a comparative evaluation of traditional Machine Learning (ML), Deep Learning (DL), and quantized small- parameter Large Language Models (LLMs) for phishing detection.<n>We show that while LLMs currently underperform compared to ML and DL methods in terms of raw accuracy, they exhibit strong potential for identifying subtle, context-based phishing cues.
arXiv Detail & Related papers (2025-07-10T04:01:52Z) - Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs [71.7892165868749]
Commercial Large Language Model (LLM) APIs create a fundamental trust problem.<n>Users pay for specific models but have no guarantee that providers deliver them faithfully.<n>We formalize this model substitution problem and evaluate detection methods under realistic adversarial conditions.<n>We propose and evaluate the use of Trusted Execution Environments (TEEs) as one practical and robust solution.
arXiv Detail & Related papers (2025-04-07T03:57:41Z) - How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities [62.474732677086855]
Large language model (LLM) routing has emerged as a crucial strategy for balancing computational costs with performance.<n>We propose the DSC benchmark: Diverse, Simple, and Categorized, an evaluation framework that categorizes router performance across a broad spectrum of query types.
arXiv Detail & Related papers (2025-03-20T19:52:30Z) - Deep Learning Approaches for Anti-Money Laundering on Mobile Transactions: Review, Framework, and Directions [51.43521977132062]
Money laundering is a financial crime that obscures the origin of illicit funds.<n>The proliferation of mobile payment platforms and smart IoT devices has significantly complicated anti-money laundering investigations.<n>This paper conducts a comprehensive review of deep learning solutions and the challenges associated with their use in AML.
arXiv Detail & Related papers (2025-03-13T05:19:44Z) - Can LLMs Hack Enterprise Networks? Autonomous Assumed Breach Penetration-Testing Active Directory Networks [1.3124479769761592]
We introduce a novel prototype designed to employ Large Language Model (LLM)-driven autonomous systems.<n>Our system represents the first demonstration of a fully autonomous, LLM-driven framework capable of compromising accounts.<n>We find that the associated costs are competitive with, and often significantly lower than, those incurred by professional human pen-testers.
arXiv Detail & Related papers (2025-02-06T17:12:43Z) - Adversarial Reasoning at Jailbreaking Time [49.70772424278124]
Large language models (LLMs) are becoming more capable and widespread.<n>Recent advances in standardizing, measuring, and scaling test-time compute suggest new methodologies for optimizing models to achieve high performance on hard tasks.<n>In this paper, we apply these advances to the task of model jailbreaking: eliciting harmful responses from aligned LLMs.
arXiv Detail & Related papers (2025-02-03T18:59:01Z) - CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines [23.925385446070717]
We introduce CEBench, an open-source toolkit for benchmarking online large language models.<n>It focuses on the critical trade-offs between expenditure and effectiveness required for LLM deployments.<n>This capability supports crucial decision-making processes aimed at maximizing effectiveness while minimizing cost impacts.
arXiv Detail & Related papers (2024-06-20T21:36:00Z) - Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.