Related papers: GradID: Adversarial Detection via Intrinsic Dimensionality of Gradients

GradID: Adversarial Detection via Intrinsic Dimensionality of Gradients

URL: http://arxiv.org/abs/2512.12827v1
Date: Sun, 14 Dec 2025 20:16:03 GMT
Title: GradID: Adversarial Detection via Intrinsic Dimensionality of Gradients
Authors: Mohammad Mahdi Razmjoo, Mohammad Mahdi Sharifian, Saeed Bagheri Shouraki,
Abstract summary: In this paper, we investigate the geometric properties of a model's input loss landscape.<n>We reveal a distinct and consistent difference in the ID for natural and adversarial data, which forms the basis of our proposed detection method.<n>Our detector significantly surpasses existing methods against a wide array of attacks, including CW and AutoAttack, achieving detection rates consistently above 92% on CIFAR-10.
Score: 0.1019561860229868
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite their remarkable performance, deep neural networks exhibit a critical vulnerability: small, often imperceptible, adversarial perturbations can lead to drastically altered model predictions. Given the stringent reliability demands of applications such as medical diagnosis and autonomous driving, robust detection of such adversarial attacks is paramount. In this paper, we investigate the geometric properties of a model's input loss landscape. We analyze the Intrinsic Dimensionality (ID) of the model's gradient parameters, which quantifies the minimal number of coordinates required to describe the data points on their underlying manifold. We reveal a distinct and consistent difference in the ID for natural and adversarial data, which forms the basis of our proposed detection method. We validate our approach across two distinct operational scenarios. First, in a batch-wise context for identifying malicious data groups, our method demonstrates high efficacy on datasets like MNIST and SVHN. Second, in the critical individual-sample setting, we establish new state-of-the-art results on challenging benchmarks such as CIFAR-10 and MS COCO. Our detector significantly surpasses existing methods against a wide array of attacks, including CW and AutoAttack, achieving detection rates consistently above 92\% on CIFAR-10. The results underscore the robustness of our geometric approach, highlighting that intrinsic dimensionality is a powerful fingerprint for adversarial detection across diverse datasets and attack strategies.

Related papers

On the Adversarial Robustness of Learning-based Conformal Novelty Detection [10.58528988397402]
We study the adversarial robustness of conformal novelty detection using AdaDetect.<n>Our results show that adversarial perturbations can significantly increase the FDR while maintaining high detection power.
arXiv Detail & Related papers (2025-10-01T03:29:11Z)
Assessing Representation Stability for Transformer Models [2.41710192205034]
Adrial text attacks remain a persistent threat to transformer models.<n>We introduce Representation Stability (RS), a model-aversa detection framework.<n>RS measures how embedding representations change when important words are masked.
arXiv Detail & Related papers (2025-08-06T21:07:49Z)
DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective [70.77570343385928]
We introduce a novel taxonomy, classifying existing methods based on their reliance on internal features (IF) (inherent to the data) versus external features (EF) (artificially introduced for auditing)<n>We formulate two primary attack types: evasion attacks, designed to conceal the use of a dataset, and forgery attacks, intending to falsely implicate an unused dataset.<n>Building on the understanding of existing methods and attack objectives, we further propose systematic attack strategies: decoupling, removal, and detection for evasion; adversarial example-based methods for forgery.<n>Our benchmark, DATABench, comprises 17 evasion attacks, 5 forgery attacks, and 9
arXiv Detail & Related papers (2025-07-08T03:07:15Z)
A Few Large Shifts: Layer-Inconsistency Based Minimal Overhead Adversarial Example Detection [13.109309606764754]
We introduce a plug-in detection framework that leverages internal layer-wise inconsistencies within the target model itself.<n>Our method achieves state-of-the-art detection performance with negligible computational overhead.
arXiv Detail & Related papers (2025-05-19T00:48:53Z)
Addressing Key Challenges of Adversarial Attacks and Defenses in the Tabular Domain: A Methodological Framework for Coherence and Consistency [25.830427564563422]
Class-Specific Anomaly Detection (CSAD) is an effective novel anomaly detection approach.<n> CSAD evaluates adversarial samples relative to their predicted class distribution, rather than a broad benign distribution.<n>Our evaluation incorporates both anomaly detection rates with SHAP-based assessments to provide a more comprehensive measure of adversarial sample quality.
arXiv Detail & Related papers (2024-12-10T09:17:09Z)
Model X-ray:Detecting Backdoored Models via Decision Boundary [62.675297418960355]
Backdoor attacks pose a significant security vulnerability for deep neural networks (DNNs) We propose Model X-ray, a novel backdoor detection approach based on the analysis of illustrated two-dimensional (2D) decision boundaries. Our approach includes two strategies focused on the decision areas dominated by clean samples and the concentration of label distribution.
arXiv Detail & Related papers (2024-02-27T12:42:07Z)
A Discrepancy Aware Framework for Robust Anomaly Detection [51.710249807397695]
We present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies. Our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance. Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
arXiv Detail & Related papers (2023-10-11T15:21:40Z)
A Geometrical Approach to Evaluate the Adversarial Robustness of Deep Neural Networks [52.09243852066406]
Adversarial Converging Time Score (ACTS) measures the converging time as an adversarial robustness metric. We validate the effectiveness and generalization of the proposed ACTS metric against different adversarial attacks on the large-scale ImageNet dataset.
arXiv Detail & Related papers (2023-10-10T09:39:38Z)
Wasserstein distributional robustness of neural networks [9.79503506460041]
Deep neural networks are known to be vulnerable to adversarial attacks (AA) For an image recognition task, this means that a small perturbation of the original can result in the image being misclassified. We re-cast the problem using techniques of Wasserstein distributionally robust optimization (DRO) and obtain novel contributions.
arXiv Detail & Related papers (2023-06-16T13:41:24Z)
Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks. We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z)
Residual Error: a New Performance Measure for Adversarial Robustness [85.0371352689919]
A major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks. This study presents the concept of residual error, a new performance measure for assessing the adversarial robustness of a deep neural network. Experimental results using the case of image classification demonstrate the effectiveness and efficacy of the proposed residual error metric.
arXiv Detail & Related papers (2021-06-18T16:34:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.