X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection
- URL: http://arxiv.org/abs/2602.15298v1
- Date: Tue, 17 Feb 2026 01:46:08 GMT
- Title: X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection
- Authors: Qi Zhang, Dian Chen, Lance M. Kaplan, Audun Jøsang, Dong Hyun Jeong, Feng Chen, Jin-Hee Cho,
- Abstract summary: This paper presents X-MAP, an eXplainable Misclassification Analysis and Profilling framework.<n>X-MAP builds interpretable topic profiles for reliably classified spam/phishing and legitimate messages.<n>As a detector, X-MAP achieves up to 0.98 AUROC and lowers the false-rejection rate at 95% TRR to 0.089 on positive predictions.
- Score: 16.604623864453043
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Misclassifications in spam and phishing detection are very harmful, as false negatives expose users to attacks while false positives degrade trust. Existing uncertainty-based detectors can flag potential errors, but possibly be deceived and offer limited interpretability. This paper presents X-MAP, an eXplainable Misclassification Analysis and Profilling framework that reveals topic-level semantic patterns behind model failures. X-MAP combines SHAP-based feature attributions with non-negative matrix factorization to build interpretable topic profiles for reliably classified spam/phishing and legitimate messages, and measures each message's deviation from these profiles using Jensen-Shannon divergence. Experiments on SMS and phishing datasets show that misclassified messages exhibit at least two times larger divergence than correctly classified ones. As a detector, X-MAP achieves up to 0.98 AUROC and lowers the false-rejection rate at 95% TRR to 0.089 on positive predictions. When used as a repair layer on base detectors, it recovers up to 97% of falsely rejected correct predictions with moderate leakage. These results demonstrate X-MAP's effectiveness and interpretability for improving spam and phishing detection.
Related papers
- Robust ML-based Detection of Conventional, LLM-Generated, and Adversarial Phishing Emails Using Advanced Text Preprocessing [3.3166006294048427]
We propose a robust phishing email detection system featuring an enhanced text preprocessing pipeline.<n>Our approach integrates widely adopted natural language processing (NLP) feature extraction techniques and machine learning algorithms.<n>We evaluate our models on publicly available datasets comprising both phishing and legitimate emails, achieving a detection accuracy of 94.26% and F1-score of 84.39%.
arXiv Detail & Related papers (2025-10-13T20:34:19Z) - BURN: Backdoor Unlearning via Adversarial Boundary Analysis [73.14147934175604]
Backdoor unlearning aims to remove backdoor-related information while preserving the model's original functionality.<n>We propose Backdoor Unlearning via adversaRial bouNdary analysis (BURN), a novel defense framework that integrates false correlation decoupling, progressive data refinement, and model purification.
arXiv Detail & Related papers (2025-07-14T17:13:06Z) - DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks [87.66245688589977]
LLM-integrated applications and agents are vulnerable to prompt injection attacks.<n>A detection method aims to determine whether a given input is contaminated by an injected prompt.<n>We propose DataSentinel, a game-theoretic method to detect prompt injection attacks.
arXiv Detail & Related papers (2025-04-15T16:26:21Z) - Debate-Driven Multi-Agent LLMs for Phishing Email Detection [0.0]
We propose a multi-agent large language model (LLM) prompting technique that simulates deceptive debates among agents to detect phishing emails.<n>Our approach uses two LLM agents to present arguments for or against the classification task, with a judge agent adjudicating the final verdict.<n>Results show that the debate structure itself is sufficient to yield accurate decisions without extra prompting strategies.
arXiv Detail & Related papers (2025-03-27T23:18:14Z) - Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks.
It is quite beneficial and challenging to detect poisoned samples from a mixed dataset.
We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z) - Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models [79.76293901420146]
Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial.
Our research investigates the fragility of uncertainty estimation and explores potential attacks.
We demonstrate that an attacker can embed a backdoor in LLMs, which, when activated by a specific trigger in the input, manipulates the model's uncertainty without affecting the final output.
arXiv Detail & Related papers (2024-07-15T23:41:11Z) - DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability
Curvature [143.5381108333212]
We show that text sampled from an large language model tends to occupy negative curvature regions of the model's log probability function.
We then define a new curvature-based criterion for judging if a passage is generated from a given LLM.
We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection.
arXiv Detail & Related papers (2023-01-26T18:44:06Z) - Multi-SpacePhish: Extending the Evasion-space of Adversarial Attacks
against Phishing Website Detectors using Machine Learning [22.304132275659924]
This paper formalizes the "evasion-space" in which an adversarial perturbation can be introduced to fool a ML-PWD.
We then propose a realistic threat model describing evasion attacks against ML-PWD that are cheap to stage, and hence intrinsically more attractive for real phishers.
arXiv Detail & Related papers (2022-10-24T23:45:09Z) - Profiler: Profile-Based Model to Detect Phishing Emails [15.109679047753355]
We propose a multidimensional risk assessment of emails to reduce the feasibility of an attacker adapting their email and avoiding detection.
We develop a risk assessment framework that includes three models which analyse an email's (1) threat level, (2) cognitive manipulation, and (3) email type.
Our Profiler can be used in conjunction with ML approaches, to reduce their misclassifications or as a labeller for large email data sets in the training stage.
arXiv Detail & Related papers (2022-08-18T10:01:55Z) - RAIDER: Reinforcement-aided Spear Phishing Detector [13.341666826984554]
Spear Phishing is a harmful cyber-attack facing business and individuals worldwide.
ML-based solutions may suffer from zero-day attacks; unseen attacks unaccounted for in the training data.
We propose RAIDER: Reinforcement AIded Spear Phishing DEtectoR.
arXiv Detail & Related papers (2021-05-17T02:42:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.