Related papers: Robust Diffusion Models for Adversarial Purification

Robust Diffusion Models for Adversarial Purification

URL: http://arxiv.org/abs/2403.16067v3
Date: Fri, 23 Aug 2024 10:55:11 GMT
Title: Robust Diffusion Models for Adversarial Purification
Authors: Guang Lin, Zerui Tao, Jianhai Zhang, Toshihisa Tanaka, Qibin Zhao,
Abstract summary: Diffusion models (DMs) based adversarial purification (AP) has shown to be the most powerful alternative to adversarial training (AT) We propose a novel robust reverse process with adversarial guidance, which is independent of given pre-trained DMs. This robust guidance can not only ensure to generate purified examples retaining more semantic content but also mitigate the accuracy-robustness trade-off of DMs.
Score: 28.313494459818497
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models (DMs) based adversarial purification (AP) has shown to be the most powerful alternative to adversarial training (AT). However, these methods neglect the fact that pre-trained diffusion models themselves are not robust to adversarial attacks as well. Additionally, the diffusion process can easily destroy semantic information and generate a high quality image but totally different from the original input image after the reverse process, leading to degraded standard accuracy. To overcome these issues, a natural idea is to harness adversarial training strategy to retrain or fine-tune the pre-trained diffusion model, which is computationally prohibitive. We propose a novel robust reverse process with adversarial guidance, which is independent of given pre-trained DMs and avoids retraining or fine-tuning the DMs. This robust guidance can not only ensure to generate purified examples retaining more semantic content but also mitigate the accuracy-robustness trade-off of DMs for the first time, which also provides DM-based AP an efficient adaptive ability to new attacks. Extensive experiments are conducted on CIFAR-10, CIFAR-100 and ImageNet to demonstrate that our method achieves the state-of-the-art results and exhibits generalization against different attacks.

Related papers

What is Adversarial Training for Diffusion Models? [4.71482540145286]
We show that adversarial training (AT) for diffusion models (DMs) fundamentally differs from classifiers.<n>AT is a way to enforce smoothness in the diffusion flow, improving to outliers and corrupted data.<n>We rigorously evaluate our approach with proof-of-concept datasets with known distributions in low- and high-dimensional space.
arXiv Detail & Related papers (2025-05-27T20:32:28Z)
Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM) To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z)
Classifier Guidance Enhances Diffusion-based Adversarial Purification by Preserving Predictive Information [75.36597470578724]
Adversarial purification is one of the promising approaches to defend neural networks against adversarial attacks. We propose gUided Purification (COUP) algorithm, which purifies while keeping away from the classifier decision boundary. Experimental results show that COUP can achieve better adversarial robustness under strong attack methods.
arXiv Detail & Related papers (2024-08-12T02:48:00Z)
Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders [101.42201747763178]
Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled. Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method.
arXiv Detail & Related papers (2024-05-02T16:49:25Z)
Struggle with Adversarial Defense? Try Diffusion [8.274506117450628]
Adrial attacks induce misclassification by introducing subtle perturbations. diffusion-based adversarial training often encounters convergence challenges and high computational expenses. We propose the Truth Maximization Diffusion (TMDC) to overcome these issues.
arXiv Detail & Related papers (2024-04-12T06:52:40Z)
How Robust Are Energy-Based Models Trained With Equilibrium Propagation? [4.374837991804085]
Adrial training is the current state-of-the-art defense against adversarial attacks. It lowers the model's accuracy on clean inputs, is computationally expensive, and offers less robustness to natural noise. In contrast, energy-based models (EBMs) incorporate feedback connections from each layer to the previous layer, yielding a recurrent, deep-attractor architecture.
arXiv Detail & Related papers (2024-01-21T16:55:40Z)
Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness [52.9493817508055]
We propose Pre-trained Model Guided Adversarial Fine-Tuning (PMG-AFT) to enhance the model's zero-shot adversarial robustness. Our approach consistently improves clean accuracy by an average of 8.72%.
arXiv Detail & Related papers (2024-01-09T04:33:03Z)
AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models [7.406040859734522]
Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques. Previous attack methods often directly inject Projected Gradient Descent (PGD) gradients into the sampling of generative models. We propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models.
arXiv Detail & Related papers (2023-07-24T03:10:02Z)
Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability [62.105715985563656]
We propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples. Our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks.
arXiv Detail & Related papers (2023-05-25T21:51:23Z)
Carefully Blending Adversarial Training and Purification Improves Adversarial Robustness [1.2289361708127877]
CARSO is able to defend itself against adaptive end-to-end white-box attacks devised for defences. Our method improves by a significant margin the state-of-the-art for CIFAR-10, CIFAR-100, and TinyImageNet-200.
arXiv Detail & Related papers (2023-05-25T09:04:31Z)
Robust Classification via a Single Diffusion Model [37.46217654590878]
Robust Diffusion (RDC) is a generative classifier constructed from a pre-trained diffusion model to be adversarially robust. RDC achieves $75.67%$ robust accuracy against various $ell_infty$ norm-bounded adaptive attacks with $epsilon_infty/255$ on CIFAR-10.
arXiv Detail & Related papers (2023-05-24T15:25:19Z)
Distributed Adversarial Training to Robustify Deep Neural Networks at Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training. We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z)
Guided Diffusion Model for Adversarial Purification [103.4596751105955]
Adversarial attacks disturb deep neural networks (DNNs) in various algorithms and frameworks. We propose a novel purification approach, referred to as guided diffusion model for purification (GDMP) On our comprehensive experiments across various datasets, the proposed GDMP is shown to reduce the perturbations raised by adversarial attacks to a shallow range.
arXiv Detail & Related papers (2022-05-30T10:11:15Z)
Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model. We propose DiffPure that uses diffusion models for adversarial purification. Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z)
Improving White-box Robustness of Pre-processing Defenses via Joint Adversarial Training [106.34722726264522]
A range of adversarial defense techniques have been proposed to mitigate the interference of adversarial noise. Pre-processing methods may suffer from the robustness degradation effect. A potential cause of this negative effect is that adversarial training examples are static and independent to the pre-processing model. We propose a method called Joint Adversarial Training based Pre-processing (JATP) defense.
arXiv Detail & Related papers (2021-06-10T01:45:32Z)
Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications. We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths. Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z)
DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles [20.46399318111058]
Adversarial attacks can mislead CNN models with small perturbations, which can effectively transfer between different models trained on the same dataset. We propose DVERGE, which isolates the adversarial vulnerability in each sub-model by distilling non-robust features. The novel diversity metric and training procedure enables DVERGE to achieve higher robustness against transfer attacks.
arXiv Detail & Related papers (2020-09-30T14:57:35Z)
Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples. Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks. In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.