Rectifying Adversarial Examples Using Their Vulnerabilities
- URL: http://arxiv.org/abs/2601.00270v1
- Date: Thu, 01 Jan 2026 09:22:11 GMT
- Title: Rectifying Adversarial Examples Using Their Vulnerabilities
- Authors: Fumiya Morimoto, Ryuto Morita, Satoshi Ono,
- Abstract summary: adversarial examples (AEs) are minimally perturbed input data undetectable to humans posing significant risks to security-dependent applications.<n>This study proposes a method for rectifying AEs to estimate the correct labels of their original inputs.<n>Results demonstrate that the proposed method exhibits consistent performance in rectifying AEs generated via various attack methods.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural network-based classifiers are prone to errors when processing adversarial examples (AEs). AEs are minimally perturbed input data undetectable to humans posing significant risks to security-dependent applications. Hence, extensive research has been undertaken to develop defense mechanisms that mitigate their threats. Most existing methods primarily focus on discriminating AEs based on the input sample features, emphasizing AE detection without addressing the correct sample categorization before an attack. While some tasks may only require mere rejection on detected AEs, others necessitate identifying the correct original input category such as traffic sign recognition in autonomous driving. The objective of this study is to propose a method for rectifying AEs to estimate the correct labels of their original inputs. Our method is based on re-attacking AEs to move them beyond the decision boundary for accurate label prediction, effectively addressing the issue of rectifying minimally perceptible AEs created using white-box attack methods. However, challenge remains with respect to effectively rectifying AEs produced by black-box attacks at a distance from the boundary, or those misclassified into low-confidence categories by targeted attacks. By adopting a straightforward approach of only considering AEs as inputs, the proposed method can address diverse attacks while avoiding the requirement of parameter adjustments or preliminary training. Results demonstrate that the proposed method exhibits consistent performance in rectifying AEs generated via various attack methods, including targeted and black-box attacks. Moreover, it outperforms conventional rectification and input transformation methods in terms of stability against various attacks.
Related papers
- Preliminary Investigation into Uncertainty-Aware Attack Stage Classification [81.28215542218724]
This work addresses the problem of attack stage inference under uncertainty.<n>We propose a classification approach based on Evidential Deep Learning (EDL), which models predictive uncertainty by outputting parameters of a Dirichlet distribution over possible stages.<n>Preliminary experiments in a simulated environment demonstrate that the proposed model can accurately infer the stage of an attack with confidence.
arXiv Detail & Related papers (2025-08-01T06:58:00Z) - TopicAttack: An Indirect Prompt Injection Attack via Topic Transition [92.26240528996443]
Large language models (LLMs) are vulnerable to indirect prompt injection attacks.<n>We propose TopicAttack, which prompts the LLM to generate a fabricated transition prompt that gradually shifts the topic toward the injected instruction.<n>We find that a higher injected-to-original attention ratio leads to a greater success probability, and our method achieves a much higher ratio than the baseline methods.
arXiv Detail & Related papers (2025-07-18T06:23:31Z) - DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks [87.66245688589977]
LLM-integrated applications and agents are vulnerable to prompt injection attacks.<n>A detection method aims to determine whether a given input is contaminated by an injected prompt.<n>We propose DataSentinel, a game-theoretic method to detect prompt injection attacks.
arXiv Detail & Related papers (2025-04-15T16:26:21Z) - Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models [52.8949080772873]
We propose an evolution-based region adversarial prompt tuning method called ER-APT.<n>In each training iteration, we first generate AEs using traditional gradient-based methods.<n> Subsequently, a genetic evolution mechanism incorporating selection, mutation, and crossover is applied to optimize the AEs.<n>The final evolved AEs are used for prompt tuning, achieving region-based adversarial optimization instead of conventional single-point adversarial prompt tuning.
arXiv Detail & Related papers (2025-03-17T07:08:47Z) - One Stone, Two Birds: Enhancing Adversarial Defense Through the Lens of Distributional Discrepancy [30.502354813427523]
Statistical adversarial data detection (SADD) detects whether an upcoming batch contains adversarial examples (AEs)<n>We propose a two-pronged adversarial defense method, named Distributional-discrepancy-based Adversarial Defense (DAD)
arXiv Detail & Related papers (2025-03-04T01:16:21Z) - Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians [60.22542847840578]
Despite advances in adversarial machine learning, inference for Gaussian models in the presence of an adversary is notably understudied.
We consider a self-interested attacker who wishes to disrupt a decisionmaker's conditional inference and subsequent actions by corrupting a set of evidentiary variables.
To avoid detection, the attacker also desires the attack to appear plausible wherein plausibility is determined by the density of the corrupted evidence.
arXiv Detail & Related papers (2024-11-21T17:46:55Z) - EvolBA: Evolutionary Boundary Attack under Hard-label Black Box condition [0.0]
Research has shown that deep neural networks (DNNs) have vulnerabilities that can lead to the misrecognition of Adversarial Examples (AEs)
This study proposes an adversarial attack method named EvolBA to generate AEs using Covariance Matrix Adaptation Evolution Strategy (CMA-ES) under the HL-BB condition.
arXiv Detail & Related papers (2024-07-02T13:12:52Z) - Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions.
Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z) - What You See is Not What the Network Infers: Detecting Adversarial
Examples Based on Semantic Contradiction [14.313178290347293]
Adversarial examples (AEs) pose severe threats to the applications of deep neural networks (DNNs) to safety-critical domains.
We propose a novel AE detection framework based on the very nature of AEs.
We show that ContraNet outperforms existing solutions by a large margin, especially under adaptive attacks.
arXiv Detail & Related papers (2022-01-24T13:15:31Z) - MixDefense: A Defense-in-Depth Framework for Adversarial Example
Detection Based on Statistical and Semantic Analysis [14.313178290347293]
We propose a multilayer defense-in-depth framework for AE detection, namely MixDefense.
We leverage the noise' features extracted from the inputs to discover the statistical difference between natural images and tampered ones for AE detection.
We show that the proposed MixDefense solution outperforms the existing AE detection techniques by a considerable margin.
arXiv Detail & Related papers (2021-04-20T15:57:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.