Releasing Inequality Phenomenon in $\ell_{\infty}$-norm Adversarial Training via Input Gradient Distillation
- URL: http://arxiv.org/abs/2305.09305v3
- Date: Fri, 27 Jun 2025 06:05:44 GMT
- Title: Releasing Inequality Phenomenon in $\ell_{\infty}$-norm Adversarial Training via Input Gradient Distillation
- Authors: Junxi Chen, Junhao Dong, Xiaohua Xie, Jianhuang Lai,
- Abstract summary: A recent study revealed that (ell_infty)-norm adversarial training ((ell_infty)-AT) will induce unevenly distributed input gradients.<n>This phenomenon makes the (ell_infty)-norm adversarially trained model more vulnerable than the standard-trained model.<n>We propose a simple yet effective method called Input Gradient Distillation (IGD) to release the inequality phenomenon in $ell_infty$-AT.
- Score: 66.5912840038179
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial training (AT) is considered the most effective defense against adversarial attacks. However, a recent study revealed that \(\ell_{\infty}\)-norm adversarial training (\(\ell_{\infty}\)-AT) will also induce unevenly distributed input gradients, which is called the inequality phenomenon. This phenomenon makes the \(\ell_{\infty}\)-norm adversarially trained model more vulnerable than the standard-trained model when high-attribution or randomly selected pixels are perturbed, enabling robust and practical black-box attacks against \(\ell_{\infty}\)-adversarially trained models. In this paper, we propose a simple yet effective method called Input Gradient Distillation (IGD) to release the inequality phenomenon in $\ell_{\infty}$-AT. IGD distills the standard-trained teacher model's equal decision pattern into the $\ell_{\infty}$-adversarially trained student model by aligning input gradients of the student model and the standard-trained model with the Cosine Similarity. Experiments show that IGD can mitigate the inequality phenomenon and its threats while preserving adversarial robustness. Compared to vanilla $\ell_{\infty}$-AT, IGD reduces error rates against inductive noise, inductive occlusion, random noise, and noisy images in ImageNet-C by up to 60\%, 16\%, 50\%, and 21\%, respectively. Other than empirical experiments, we also conduct a theoretical analysis to explain why releasing the inequality phenomenon can improve such robustness and discuss why the severity of the inequality phenomenon varies according to the dataset's image resolution. Our code is available at https://github.com/fhdnskfbeuv/Inuput-Gradient-Distillation
Related papers
- Towards Model Resistant to Transferable Adversarial Examples via Trigger Activation [95.3977252782181]
Adversarial examples, characterized by imperceptible perturbations, pose significant threats to deep neural networks by misleading their predictions.
We introduce a novel training paradigm aimed at enhancing robustness against transferable adversarial examples (TAEs) in a more efficient and effective way.
arXiv Detail & Related papers (2025-04-20T09:07:10Z) - Robust Representation Consistency Model via Contrastive Denoising [83.47584074390842]
randomized smoothing provides theoretical guarantees for certifying robustness against adversarial perturbations.
diffusion models have been successfully employed for randomized smoothing to purify noise-perturbed samples.
We reformulate the generative modeling task along the diffusion trajectories in pixel space as a discriminative task in the latent space.
arXiv Detail & Related papers (2025-01-22T18:52:06Z) - Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis [55.561961365113554]
3D Gaussian Splatting (3DGS) has demonstrated remarkable effectiveness in novel view synthesis (NVS)<n>In this paper, we introduce Self-Ensembling Gaussian Splatting (SE-GS)<n>We achieve self-ensembling by incorporating an uncertainty-aware perturbation strategy during training.<n> Experimental results on the LLFF, Mip-NeRF360, DTU, and MVImgNet datasets demonstrate that our approach enhances NVS quality under few-shot training conditions.
arXiv Detail & Related papers (2024-10-31T18:43:48Z) - Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
We propose an algorithm that enables fast and high-quality generation under arbitrary constraints.<n>During inference, we can interchange between gradient updates computed on the noisy image and updates computed on the final, clean image.<n>Our approach produces results that rival or surpass the state-of-the-art training-free inference approaches.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think [72.48325960659822]
One main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations.
We study this by introducing a straightforward regularization called REPresentation Alignment (REPA), which aligns the projections of noisy input hidden states in denoising networks with clean image representations obtained from external, pretrained visual encoders.
The results are striking: our simple strategy yields significant improvements in both training efficiency and generation quality when applied to popular diffusion and flow-based transformers, such as DiTs and SiTs.
arXiv Detail & Related papers (2024-10-09T14:34:53Z) - The Bad Batches: Enhancing Self-Supervised Learning in Image Classification Through Representative Batch Curation [1.519321208145928]
The pursuit of learning robust representations without human supervision is a longstanding challenge.
This paper attempts to alleviate the influence of false positive and false negative pairs by employing pairwise similarity calculations through the Fr'echet ResNet Distance (FRD)
The effectiveness of the proposed method is substantiated by empirical results, where a linear classifier trained on self-supervised contrastive representations achieved an impressive 87.74% top-1 accuracy on STL10 and 99.31% on the Flower102 dataset.
arXiv Detail & Related papers (2024-03-28T17:04:07Z) - Stable Unlearnable Example: Enhancing the Robustness of Unlearnable
Examples via Stable Error-Minimizing Noise [31.586389548657205]
Unlearnable example is proposed to significantly degrade the generalization performance of models by adding a kind of imperceptible noise to the data.
We introduce stable error-minimizing noise (SEM), which trains the defensive noise against random perturbation instead of the time-consuming adversarial perturbation.
SEM achieves a new state-of-the-art performance on CIFAR-10, CIFAR-100, and ImageNet Subset.
arXiv Detail & Related papers (2023-11-22T01:43:57Z) - Reducing Spatial Fitting Error in Distillation of Denoising Diffusion
Models [13.364271265023953]
Knowledge distillation for diffusion models is an effective method to address this limitation with a shortened sampling process.
We attribute the degradation to the spatial fitting error occurring in the training of both the teacher and student model.
SFERD utilizes attention guidance from the teacher model and a designed semantic gradient predictor to reduce the student's fitting error.
We achieve an FID of 5.31 on CIFAR-10 and 9.39 on ImageNet 64$times$64 with only one step, outperforming existing diffusion methods.
arXiv Detail & Related papers (2023-11-07T09:19:28Z) - Reducing Adversarial Training Cost with Gradient Approximation [0.3916094706589679]
We propose a new and efficient adversarial training method, adversarial training with gradient approximation (GAAT) to reduce the cost of building up robust models.
Our proposed method saves up to 60% of the training time with comparable model test accuracy on datasets.
arXiv Detail & Related papers (2023-09-18T03:55:41Z) - On the Vulnerability of Fairness Constrained Learning to Malicious Noise [28.176039923404883]
We consider the vulnerability of fairness-constrained learning to small amounts of malicious noise in the training data.
For example, for Demographic Parity we show we can incur only a $Theta(alpha)$ loss in accuracy, where $alpha$ is the malicious noise rate.
For Equal Opportunity, we show we can incur an $O(sqrtalpha)$ loss, and give a matching $Omega(sqrtalpha)$ lower bound.
arXiv Detail & Related papers (2023-07-21T20:26:54Z) - Evaluating Similitude and Robustness of Deep Image Denoising Models via
Adversarial Attack [60.40356882897116]
Deep neural networks (DNNs) have shown superior performance compared to traditional image denoising algorithms.
In this paper, we propose an adversarial attack method named denoising-PGD which can successfully attack all the current deep denoising models.
arXiv Detail & Related papers (2023-06-28T09:30:59Z) - Robust Classification via a Single Diffusion Model [37.46217654590878]
Robust Diffusion (RDC) is a generative classifier constructed from a pre-trained diffusion model to be adversarially robust.
RDC achieves $75.67%$ robust accuracy against various $ell_infty$ norm-bounded adaptive attacks with $epsilon_infty/255$ on CIFAR-10.
arXiv Detail & Related papers (2023-05-24T15:25:19Z) - Beyond Pretrained Features: Noisy Image Modeling Provides Adversarial
Defense [52.66971714830943]
Masked image modeling (MIM) has made it a prevailing framework for self-supervised visual representation learning.
In this paper, we investigate how this powerful self-supervised learning paradigm can provide adversarial robustness to downstream classifiers.
We propose an adversarial defense method, referred to as De3, by exploiting the pretrained decoder for denoising.
arXiv Detail & Related papers (2023-02-02T12:37:24Z) - Guided Diffusion Model for Adversarial Purification [103.4596751105955]
Adversarial attacks disturb deep neural networks (DNNs) in various algorithms and frameworks.
We propose a novel purification approach, referred to as guided diffusion model for purification (GDMP)
On our comprehensive experiments across various datasets, the proposed GDMP is shown to reduce the perturbations raised by adversarial attacks to a shallow range.
arXiv Detail & Related papers (2022-05-30T10:11:15Z) - Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
We propose DiffPure that uses diffusion models for adversarial purification.
Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z) - Fast Gradient Non-sign Methods [67.56549792690706]
Fast Gradient Non-sign Method (FGNM) is a general routine, which can seamlessly replace the conventional $sign$ operation in gradient-based attacks.
Our methods outperform them by textbf27.5% at most and textbf9.5% on average.
arXiv Detail & Related papers (2021-10-25T08:46:00Z) - Towards Adversarial Patch Analysis and Certified Defense against Crowd
Counting [61.99564267735242]
Crowd counting has drawn much attention due to its importance in safety-critical surveillance systems.
Recent studies have demonstrated that deep neural network (DNN) methods are vulnerable to adversarial attacks.
We propose a robust attack strategy called Adversarial Patch Attack with Momentum to evaluate the robustness of crowd counting models.
arXiv Detail & Related papers (2021-04-22T05:10:55Z) - Rethinking the Role of Gradient-Based Attribution Methods for Model
Interpretability [8.122270502556374]
Current methods for interpretability of discriminative deep neural networks rely on the model's input-gradients.
We show that these input-gradients can be arbitrarily manipulated without changing the discriminative function.
arXiv Detail & Related papers (2020-06-16T13:17:32Z) - Robust Face Verification via Disentangled Representations [20.393894616979402]
We introduce a robust algorithm for face verification, deciding whether twoimages are of the same person or not.
We use the generativemodel during training as an online augmentation method instead of a test-timepurifier that removes adversarial noise.
We experimentally show that, when coupled with adversarial training, the proposed scheme converges with aweak inner solver and has a higher clean and robust accuracy than state-of-the-art-methods when evaluated against white-box physical attacks.
arXiv Detail & Related papers (2020-06-05T19:17:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.