Deep learning models are vulnerable, but adversarial examples are even more vulnerable
- URL: http://arxiv.org/abs/2511.05073v1
- Date: Fri, 07 Nov 2025 08:43:08 GMT
- Title: Deep learning models are vulnerable, but adversarial examples are even more vulnerable
- Authors: Jun Li, Yanwei Xu, Keran Li, Xiaoli Zhang,
- Abstract summary: This study first empirically finds that image-based adversarial examples are notably sensitive to occlusion.<n>We propose Sliding Window Mask-based Adrial Example Detection (SWM-AED), which avoids catastrophic overfitting of conventional adversarial training.<n> Evaluations across classifiers and attacks on CIFAR-10 demonstrate robust performance, with accuracy over 62% in most cases and up to 96.5%.
- Score: 7.097468024050319
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding intrinsic differences between adversarial examples and clean samples is key to enhancing DNN robustness and detection against adversarial attacks. This study first empirically finds that image-based adversarial examples are notably sensitive to occlusion. Controlled experiments on CIFAR-10 used nine canonical attacks (e.g., FGSM, PGD) to generate adversarial examples, paired with original samples for evaluation. We introduce Sliding Mask Confidence Entropy (SMCE) to quantify model confidence fluctuation under occlusion. Using 1800+ test images, SMCE calculations supported by Mask Entropy Field Maps and statistical distributions show adversarial examples have significantly higher confidence volatility under occlusion than originals. Based on this, we propose Sliding Window Mask-based Adversarial Example Detection (SWM-AED), which avoids catastrophic overfitting of conventional adversarial training. Evaluations across classifiers and attacks on CIFAR-10 demonstrate robust performance, with accuracy over 62% in most cases and up to 96.5%.
Related papers
- How Worst-Case Are Adversarial Attacks? Linking Adversarial and Perturbation Robustness [4.60092781176058]
Adrial attacks are widely used to identify model vulnerabilities, but their validity as proxies for robustness to random perturbations remains debated.<n>We ask whether an adversarial example provides a representative estimate of misprediction risk under perturbations of the same magnitude.<n>We study the limits of this connection by proposing an attack strategy designed to probe vulnerabilities in regimes that are statistically closer to uniform noise.
arXiv Detail & Related papers (2026-01-20T22:24:47Z) - Prediction Inconsistency Helps Achieve Generalizable Detection of Adversarial Examples [31.535244194865236]
Prediction Inconsistency Detector (PID) is a lightweight and generalizable detection framework.<n>PID is compatible with both naturally and adversarially trained primal models.<n>It outperforms four detection methods across 3 white-box, 3 black-box, and 1 mixed adversarial attacks.
arXiv Detail & Related papers (2025-06-04T09:29:11Z) - CopyrightShield: Enhancing Diffusion Model Security against Copyright Infringement Attacks [61.06621533874629]
Diffusion models are vulnerable to copyright infringement attacks, where attackers inject strategically modified non-infringing images into the training set.<n>We first propose a defense framework, CopyrightShield, to defend against the above attack.<n> Experimental results demonstrate that CopyrightShield significantly improves poisoned sample detection performance across two attack scenarios.
arXiv Detail & Related papers (2024-12-02T14:19:44Z) - Confidence Aware Learning for Reliable Face Anti-spoofing [52.23271636362843]
We propose a Confidence Aware Face Anti-spoofing model, which is aware of its capability boundary.<n>We estimate its confidence during the prediction of each sample.<n>Experiments show that the proposed CA-FAS can effectively recognize samples with low prediction confidence.
arXiv Detail & Related papers (2024-11-02T14:29:02Z) - Imperceptible Face Forgery Attack via Adversarial Semantic Mask [59.23247545399068]
We propose an Adversarial Semantic Mask Attack framework (ASMA) which can generate adversarial examples with good transferability and invisibility.
Specifically, we propose a novel adversarial semantic mask generative model, which can constrain generated perturbations in local semantic regions for good stealthiness.
arXiv Detail & Related papers (2024-06-16T10:38:11Z) - Extreme Miscalibration and the Illusion of Adversarial Robustness [66.29268991629085]
Adversarial Training is often used to increase model robustness.
We show that this observed gain in robustness is an illusion of robustness (IOR)
We urge the NLP community to incorporate test-time temperature scaling into their robustness evaluations.
arXiv Detail & Related papers (2024-02-27T13:49:12Z) - Adversarial defense based on distribution transfer [22.14684430074648]
The presence of adversarial examples poses a significant threat to deep learning models and their applications.
Existing defense methods provide certain resilience against adversarial examples, but often suffer from decreased accuracy and generalization performance.
This paper proposes a defense method based on distribution shift, leveraging the distribution transfer capability of a diffusion model for adversarial defense.
arXiv Detail & Related papers (2023-11-23T08:01:18Z) - Latent Feature Relation Consistency for Adversarial Robustness [80.24334635105829]
misclassification will occur when deep neural networks predict adversarial examples which add human-imperceptible adversarial noise to natural examples.
We propose textbfLatent textbfFeature textbfRelation textbfConsistency (textbfLFRC)
LFRC constrains the relation of adversarial examples in latent space to be consistent with the natural examples.
arXiv Detail & Related papers (2023-03-29T13:50:01Z) - Identifying Adversarially Attackable and Robust Samples [1.4213973379473654]
Adrial attacks insert small, imperceptible perturbations to input samples that cause large, undesired changes to the output of deep learning models.
This work introduces the notion of sample attackability, where we aim to identify samples that are most susceptible to adversarial attacks.
We propose a deep-learning-based detector to identify the adversarially attackable and robust samples in an unseen dataset for an unseen target model.
arXiv Detail & Related papers (2023-01-30T13:58:14Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Attack-Agnostic Adversarial Detection [13.268960384729088]
We quantify the statistical deviation caused by adversarial agnostics in two aspects.
We show that our method can achieve an overall ROC AUC of 94.9%, 89.7%, and 94.6% on CIFAR10, CIFAR100, and SVHN, respectively, and has comparable performance to adversarial detectors trained with adversarial examples on most of the attacks.
arXiv Detail & Related papers (2022-06-01T13:41:40Z) - Exploiting epistemic uncertainty of the deep learning models to generate
adversarial samples [0.7734726150561088]
"Adversarial Machine Learning" aims to devise new adversarial attacks and to defend against these attacks with more robust architectures.
This study explores the usage of quantified epistemic uncertainty obtained from Monte-Carlo Dropout Sampling for adversarial attack purposes.
Our results show that our proposed hybrid attack approach increases the attack success rates from 82.59% to 85.40%, 82.86% to 89.92% and 88.06% to 90.03% on datasets.
arXiv Detail & Related papers (2021-02-08T11:59:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.