Explainable Transformer-Based Email Phishing Classification with Adversarial Robustness
- URL: http://arxiv.org/abs/2511.12085v1
- Date: Sat, 15 Nov 2025 08:05:47 GMT
- Title: Explainable Transformer-Based Email Phishing Classification with Adversarial Robustness
- Authors: Sajad U P,
- Abstract summary: Phishing and related cyber threats are becoming more varied and technologically advanced.<n>Recent threats, specifically Artificial Intelligence (AI)-generated phishing attacks, are reducing the overall system resilience of phishing detectors.<n>This study presents a hybrid approach that uses DistilBERT, a smaller, faster, and lighter version of the BERT transformer model for email classification.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Phishing and related cyber threats are becoming more varied and technologically advanced. Among these, email-based phishing remains the most dominant and persistent threat. These attacks exploit human vulnerabilities to disseminate malware or gain unauthorized access to sensitive information. Deep learning (DL) models, particularly transformer-based models, have significantly enhanced phishing mitigation through their contextual understanding of language. However, some recent threats, specifically Artificial Intelligence (AI)-generated phishing attacks, are reducing the overall system resilience of phishing detectors. In response, adversarial training has shown promise against AI-generated phishing threats. This study presents a hybrid approach that uses DistilBERT, a smaller, faster, and lighter version of the BERT transformer model for email classification. Robustness against text-based adversarial perturbations is reinforced using Fast Gradient Method (FGM) adversarial training. Furthermore, the framework integrates the LIME Explainable AI (XAI) technique to enhance the transparency of the DistilBERT architecture. The framework also uses the Flan-T5-small language model from Hugging Face to generate plain-language security narrative explanations for end-users. This combined approach ensures precise phishing classification while providing easily understandable justifications for the model's decisions.
Related papers
- Robust ML-based Detection of Conventional, LLM-Generated, and Adversarial Phishing Emails Using Advanced Text Preprocessing [3.3166006294048427]
We propose a robust phishing email detection system featuring an enhanced text preprocessing pipeline.<n>Our approach integrates widely adopted natural language processing (NLP) feature extraction techniques and machine learning algorithms.<n>We evaluate our models on publicly available datasets comprising both phishing and legitimate emails, achieving a detection accuracy of 94.26% and F1-score of 84.39%.
arXiv Detail & Related papers (2025-10-13T20:34:19Z) - SoK: Large Language Model-Generated Textual Phishing Campaigns End-to-End Analysis of Generation, Characteristics, and Detection [3.7549350220109274]
Large language models (LLMs) enable Phishing-as-a-Service'' attacks at scale within minutes.<n>Despite the growing research into LLM-facilitated phishing attacks, consolidated systematic research on the phishing attack life cycle remains scarce.<n>We present the first systematization of knowledge (SoK) on LLM-generated phishing, offering an end-to-end analysis that spans generation techniques, attack features, and mitigation strategies.
arXiv Detail & Related papers (2025-08-29T09:39:46Z) - EXPLICATE: Enhancing Phishing Detection through Explainable AI and LLM-Powered Interpretability [44.2907457629342]
EXPLICATE is a framework that enhances phishing detection through a three-component architecture.<n>It is on par with existing deep learning techniques but has better explainability.<n>It addresses the critical divide between automated AI and user trust in phishing detection systems.
arXiv Detail & Related papers (2025-03-22T23:37:35Z) - Reformulation is All You Need: Addressing Malicious Text Features in DNNs [53.45564571192014]
We propose a unified and adaptive defense framework that is effective against both adversarial and backdoor attacks.<n>Our framework outperforms existing sample-oriented defense baselines across a diverse range of malicious textual features.
arXiv Detail & Related papers (2025-02-02T03:39:43Z) - PEEK: Phishing Evolution Framework for Phishing Generation and Evolving Pattern Analysis using Large Language Models [10.455333111937598]
Phishing remains a pervasive cyber threat, as attackers craft deceptive emails to lure victims into revealing sensitive information.<n>Deep learning has become a key component in defending against phishing attacks, but these approaches face critical limitations.<n>We propose the first Phishing Evolution FramEworK (PEEK) for augmenting phishing email datasets with respect to quality and diversity.
arXiv Detail & Related papers (2024-11-18T09:03:51Z) - CTINexus: Automatic Cyber Threat Intelligence Knowledge Graph Construction Using Large Language Models [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.<n>Current CTI knowledge extraction methods lack flexibility and generalizability.<n>We propose CTINexus, a novel framework for data-efficient CTI knowledge extraction and high-quality cybersecurity knowledge graph (CSKG) construction.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - Deep Learning-Based Speech and Vision Synthesis to Improve Phishing
Attack Detection through a Multi-layer Adaptive Framework [1.3353802999735709]
Current anti-phishing methods remain vulnerable to complex phishing because of the increasingly sophistication tactics adopted by attacker.
In this research, we proposed a framework that combines Deep learning and Randon Forest to read images, synthesize speech from deep-fake videos, and natural language processing.
arXiv Detail & Related papers (2024-02-27T06:47:52Z) - Defending Large Language Models against Jailbreak Attacks via Semantic
Smoothing [107.97160023681184]
Aligned large language models (LLMs) are vulnerable to jailbreaking attacks.
We propose SEMANTICSMOOTH, a smoothing-based defense that aggregates predictions of semantically transformed copies of a given input prompt.
arXiv Detail & Related papers (2024-02-25T20:36:03Z) - An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach [2.1301560294088318]
Phishing email is a serious cyber threat that tries to deceive users by sending false emails with the intention of stealing confidential information or causing financial harm.<n>Despite extensive academic research, phishing detection remains an ongoing and formidable challenge in the cybersecurity landscape.<n>We present an optimized, fine-tuned transformer-based DistilBERT model designed for the detection of phishing emails.
arXiv Detail & Related papers (2024-02-21T15:23:21Z) - Instruct2Attack: Language-Guided Semantic Adversarial Attacks [76.83548867066561]
Instruct2Attack (I2A) is a language-guided semantic attack that generates meaningful perturbations according to free-form language instructions.
We make use of state-of-the-art latent diffusion models, where we adversarially guide the reverse diffusion process to search for an adversarial latent code conditioned on the input image and text instruction.
We show that I2A can successfully break state-of-the-art deep neural networks even under strong adversarial defenses.
arXiv Detail & Related papers (2023-11-27T05:35:49Z) - Deep convolutional forest: a dynamic deep ensemble approach for spam
detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically.
As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.