Related papers: Empirical Analysis of Adversarial Robustness and Explainability Drift in Cybersecurity Classifiers

Empirical Analysis of Adversarial Robustness and Explainability Drift in Cybersecurity Classifiers

URL: http://arxiv.org/abs/2602.06395v1
Date: Fri, 06 Feb 2026 05:30:37 GMT
Title: Empirical Analysis of Adversarial Robustness and Explainability Drift in Cybersecurity Classifiers
Authors: Mona Rajhans, Vishal Khawarey,
Abstract summary: This paper presents an empirical study of adversarial robustness and explainability drift across two cybersecurity domains.<n>We introduce a quantitative metric, the Robustness Index (RI), defined as the area under the accuracy perturbation curve.<n>Experiments on the Phishing Websites and NB15 datasets show consistent robustness trends, with adversarial training improving RI by up to 9 percent while maintaining clean-data accuracy.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning (ML) models are increasingly deployed in cybersecurity applications such as phishing detection and network intrusion prevention. However, these models remain vulnerable to adversarial perturbations small, deliberate input modifications that can degrade detection accuracy and compromise interpretability. This paper presents an empirical study of adversarial robustness and explainability drift across two cybersecurity domains phishing URL classification and network intrusion detection. We evaluate the impact of L (infinity) bounded Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) perturbations on model accuracy and introduce a quantitative metric, the Robustness Index (RI), defined as the area under the accuracy perturbation curve. Gradient based feature sensitivity and SHAP based attribution drift analyses reveal which input features are most susceptible to adversarial manipulation. Experiments on the Phishing Websites and UNSW NB15 datasets show consistent robustness trends, with adversarial training improving RI by up to 9 percent while maintaining clean-data accuracy. These findings highlight the coupling between robustness and interpretability degradation and underscore the importance of quantitative evaluation in the design of trustworthy, AI-driven cybersecurity systems.

Related papers

ThreatFormer-IDS: Robust Transformer Intrusion Detection with Zero-Day Generalization and Explainable Attribution [0.0]
Intrusion detection in IoT and industrial networks requires models that can detect rare attacks at low false-positive rates while remaining reliable under evolving traffic and limited labels.<n>We propose ThreatFormer- IDS, a Transformer-based sequence modeling framework that converts flow records into time-ordered windows and learns contextual representations for robust intrusion screening.<n>On the ToN IoT benchmark with chronological evaluation, ThreatFormer-IDS achieves AUCROC 0.994, AUC-PR 0.956, and Recall@1%FPR 0.910, outperforming strong tree-based and sequence baselines.
arXiv Detail & Related papers (2026-02-26T23:20:42Z)
Explainability-Guided Defense: Attribution-Aware Model Refinement Against Adversarial Data Attacks [6.573058520271728]
We identify a connection between interpretability and robustness that can be directly leveraged during training.<n>We introduce an attribution-guided refinement framework that transforms Local Interpretable Model-Agnostic Explanations into an active training signal.
arXiv Detail & Related papers (2026-01-02T19:36:03Z)
GradID: Adversarial Detection via Intrinsic Dimensionality of Gradients [0.1019561860229868]
In this paper, we investigate the geometric properties of a model's input loss landscape.<n>We reveal a distinct and consistent difference in the ID for natural and adversarial data, which forms the basis of our proposed detection method.<n>Our detector significantly surpasses existing methods against a wide array of attacks, including CW and AutoAttack, achieving detection rates consistently above 92% on CIFAR-10.
arXiv Detail & Related papers (2025-12-14T20:16:03Z)
Towards Trustworthy Wi-Fi Sensing: Systematic Evaluation of Deep Learning Model Robustness to Adversarial Attacks [4.5835414225547195]
We evaluate the robustness of CSI deep learning models under diverse threat models and varying degrees of attack realism.<n>Our experiments show that smaller models, while efficient and equally performant on clean data, are markedly less robust.<n>We confirm that physically realizable signal-space perturbations, designed to be feasible in real wireless channels, significantly reduce attack success.
arXiv Detail & Related papers (2025-11-25T16:24:29Z)
Deep Learning Models for Robust Facial Liveness Detection [56.08694048252482]
This study introduces a robust solution through novel deep learning models addressing the deficiencies in contemporary anti-spoofing techniques.<n>By innovatively integrating texture analysis and reflective properties associated with genuine human traits, our models distinguish authentic presence from replicas with remarkable precision.
arXiv Detail & Related papers (2025-08-12T17:19:20Z)
A Gradient-Optimized TSK Fuzzy Framework for Explainable Phishing Detection [0.0]
Existing phishing detection methods struggle to simultaneously achieve high accuracy and explainability.<n>We propose a novel phishing URL detection system based on a first-order Takagi-Sugeno-Kang fuzzy inference model optimized through gradient-based techniques.
arXiv Detail & Related papers (2025-04-25T18:31:05Z)
Fragility-aware Classification for Understanding Risk and Improving Generalization [6.926253982569273]
We introduce the Fragility Index (FI), a novel metric that evaluates classification performance from a risk-averse perspective.<n>We derive exact reformulations for cross-entropy loss, hinge-type loss, and Lipschitz loss, and extend the approach to deep learning models.
arXiv Detail & Related papers (2025-02-18T16:44:03Z)
Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification. We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations. Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z)
Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks. We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z)
Residual Error: a New Performance Measure for Adversarial Robustness [85.0371352689919]
A major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks. This study presents the concept of residual error, a new performance measure for assessing the adversarial robustness of a deep neural network. Experimental results using the case of image classification demonstrate the effectiveness and efficacy of the proposed residual error metric.
arXiv Detail & Related papers (2021-06-18T16:34:23Z)
Non-Singular Adversarial Robustness of Neural Networks [58.731070632586594]
Adrial robustness has become an emerging challenge for neural network owing to its over-sensitivity to small input perturbations. We formalize the notion of non-singular adversarial robustness for neural networks through the lens of joint perturbations to data inputs as well as model weights.
arXiv Detail & Related papers (2021-02-23T20:59:30Z)
Uncertainty-Aware Deep Calibrated Salient Object Detection [74.58153220370527]
Existing deep neural network based salient object detection (SOD) methods mainly focus on pursuing high network accuracy. These methods overlook the gap between network accuracy and prediction confidence, known as the confidence uncalibration problem. We introduce an uncertaintyaware deep SOD network, and propose two strategies to prevent deep SOD networks from being overconfident.
arXiv Detail & Related papers (2020-12-10T23:28:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.