Rectifying Adversarial Sample with Low Entropy Prior for Test-Time Defense
- URL: http://arxiv.org/abs/2507.03427v1
- Date: Fri, 04 Jul 2025 09:35:01 GMT
- Title: Rectifying Adversarial Sample with Low Entropy Prior for Test-Time Defense
- Authors: Lina Ma, Xiaowei Fu, Fuxiang Huang, Xinbo Gao, Lei Zhang,
- Abstract summary: Existing defense methods fail to defend against unknown attacks.<n>We reveal the commonly overlooked low entropy prior implied in various adversarial samples.<n>We propose a two-stage REAL approach: Rectify Adversarial sample based on LE prior for test-time adversarial rectification.
- Score: 44.263763516566996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing defense methods fail to defend against unknown attacks and thus raise generalization issue of adversarial robustness. To remedy this problem, we attempt to delve into some underlying common characteristics among various attacks for generality. In this work, we reveal the commonly overlooked low entropy prior (LE) implied in various adversarial samples, and shed light on the universal robustness against unseen attacks in inference phase. LE prior is elaborated as two properties across various attacks as shown in Fig. 1 and Fig. 2: 1) low entropy misclassification for adversarial samples and 2) lower entropy prediction for higher attack intensity. This phenomenon stands in stark contrast to the naturally distributed samples. The LE prior can instruct existing test-time defense methods, thus we propose a two-stage REAL approach: Rectify Adversarial sample based on LE prior for test-time adversarial rectification. Specifically, to align adversarial samples more closely with clean samples, we propose to first rectify adversarial samples misclassified with low entropy by reverse maximizing prediction entropy, thereby eliminating their adversarial nature. To ensure the rectified samples can be correctly classified with low entropy, we carry out secondary rectification by forward minimizing prediction entropy, thus creating a Max-Min entropy optimization scheme. Further, based on the second property, we propose an attack-aware weighting mechanism to adaptively adjust the strengths of Max-Min entropy objectives. Experiments on several datasets show that REAL can greatly improve the performance of existing sample rectification models.
Related papers
- Minimax rates of convergence for nonparametric regression under adversarial attacks [3.244945627960733]
We theoretically analyse the limits of robustness against adversarial attacks in a nonparametric regression setting.<n>Our work reveals that the minimax rate under adversarial attacks in the input is the same as sum of two terms.
arXiv Detail & Related papers (2024-10-12T07:11:38Z) - Enhancing the Antidote: Improved Pointwise Certifications against Poisoning Attacks [30.42301446202426]
Poisoning attacks can disproportionately influence model behaviour by making small changes to the training corpus.
We make it possible to provide guarantees of the robustness of a sample against adversarial attacks modifying a finite number of training samples.
arXiv Detail & Related papers (2023-08-15T03:46:41Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - ADDMU: Detection of Far-Boundary Adversarial Examples with Data and
Model Uncertainty Estimation [125.52743832477404]
Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks.
We propose a new technique, textbfADDMU, which combines two types of uncertainty estimation for both regular and FB adversarial example detection.
Our new method outperforms previous methods by 3.6 and 6.0 emphAUC points under each scenario.
arXiv Detail & Related papers (2022-10-22T09:11:12Z) - Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage.
Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack.
We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z) - Rethinking Textual Adversarial Defense for Pre-trained Language Models [79.18455635071817]
A literature review shows that pre-trained language models (PrLMs) are vulnerable to adversarial attacks.
We propose a novel metric (Degree of Anomaly) to enable current adversarial attack approaches to generate more natural and imperceptible adversarial examples.
We show that our universal defense framework achieves comparable or even higher after-attack accuracy with other specific defenses.
arXiv Detail & Related papers (2022-07-21T07:51:45Z) - On the Limitations of Stochastic Pre-processing Defenses [42.80542472276451]
Defending against adversarial examples remains an open problem.
A common belief is that randomness at inference increases the cost of finding adversarial inputs.
In this paper, we investigate such pre-processing defenses and demonstrate that they are flawed.
arXiv Detail & Related papers (2022-06-19T21:54:42Z) - Adversarial Attacks and Defense for Non-Parametric Two-Sample Tests [73.32304304788838]
This paper systematically uncovers the failure mode of non-parametric TSTs through adversarial attacks.
To enable TST-agnostic attacks, we propose an ensemble attack framework that jointly minimizes the different types of test criteria.
To robustify TSTs, we propose a max-min optimization that iteratively generates adversarial pairs to train the deep kernels.
arXiv Detail & Related papers (2022-02-07T11:18:04Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Adversarial robustness via stochastic regularization of neural
activation sensitivity [24.02105949163359]
We suggest a novel defense mechanism that simultaneously addresses both defense goals.
We flatten the gradients of the loss surface, making adversarial examples harder to find.
In addition, we push the decision away from correctly classified inputs by leveraging Jacobian regularization.
arXiv Detail & Related papers (2020-09-23T19:31:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.