Robust Evaluation of Diffusion-Based Adversarial Purification
- URL: http://arxiv.org/abs/2303.09051v3
- Date: Sun, 3 Dec 2023 19:26:11 GMT
- Title: Robust Evaluation of Diffusion-Based Adversarial Purification
- Authors: Minjong Lee, Dongwoo Kim
- Abstract summary: Diffusion-based purification methods aim to remove adversarial effects from an input data point at test time.
White-box attacks are often employed to measure the robustness of the purification.
We propose a new purification strategy improving robustness compared to the current diffusion-based purification methods.
- Score: 3.634387981995277
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We question the current evaluation practice on diffusion-based purification
methods. Diffusion-based purification methods aim to remove adversarial effects
from an input data point at test time. The approach gains increasing attention
as an alternative to adversarial training due to the disentangling between
training and testing. Well-known white-box attacks are often employed to
measure the robustness of the purification. However, it is unknown whether
these attacks are the most effective for the diffusion-based purification since
the attacks are often tailored for adversarial training. We analyze the current
practices and provide a new guideline for measuring the robustness of
purification methods against adversarial attacks. Based on our analysis, we
further propose a new purification strategy improving robustness compared to
the current diffusion-based purification methods.
Related papers
- Diffusion-based Adversarial Purification for Intrusion Detection [0.6990493129893112]
crafted perturbations mislead ML models, enabling attackers to evade detection or trigger false alerts.
adversarial purification has emerged as a compelling solution, particularly with diffusion models showing promising results.
This paper demonstrates the effectiveness of diffusion models in purifying adversarial examples in network intrusion detection.
arXiv Detail & Related papers (2024-06-25T14:48:28Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders [101.42201747763178]
Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled.
Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method.
arXiv Detail & Related papers (2024-05-02T16:49:25Z) - Diffusion Denoising as a Certified Defense against Clean-label Poisoning [56.04951180983087]
We show how an off-the-shelf diffusion model can sanitize the tampered training data.
We extensively test our defense against seven clean-label poisoning attacks and reduce their attack success to 0-16% with only a negligible drop in the test time accuracy.
arXiv Detail & Related papers (2024-03-18T17:17:07Z) - Scalable Ensemble-based Detection Method against Adversarial Attacks for
speaker verification [73.30974350776636]
This paper comprehensively compares mainstream purification techniques in a unified framework.
We propose an easy-to-follow ensemble approach that integrates advanced purification modules for detection.
arXiv Detail & Related papers (2023-12-14T03:04:05Z) - FreqFed: A Frequency Analysis-Based Approach for Mitigating Poisoning
Attacks in Federated Learning [98.43475653490219]
Federated learning (FL) is susceptible to poisoning attacks.
FreqFed is a novel aggregation mechanism that transforms the model updates into the frequency domain.
We demonstrate that FreqFed can mitigate poisoning attacks effectively with a negligible impact on the utility of the aggregated model.
arXiv Detail & Related papers (2023-12-07T16:56:24Z) - Adversarial Purification for Data-Driven Power System Event Classifiers
with Diffusion Models [0.8848340429852071]
Global deployment of phasor measurement units (PMUs) enables real-time monitoring of the power system.
Recent studies reveal that machine learning-based methods are vulnerable to adversarial attacks.
This paper proposes an effective adversarial purification method based on the diffusion model to counter adversarial attacks.
arXiv Detail & Related papers (2023-11-13T06:52:56Z) - Purify++: Improving Diffusion-Purification with Advanced Diffusion
Models and Control of Randomness [22.87882885963586]
Defense against adversarial attacks is important for AI safety.
Adversarial purification is a family of approaches that defend adversarial attacks with suitable pre-processing.
We propose Purify++, a new diffusion purification algorithm that is now the state-of-the-art purification method against several adversarial attacks.
arXiv Detail & Related papers (2023-10-28T17:18:38Z) - Language Guided Adversarial Purification [3.9931474959554496]
Adversarial purification using generative models demonstrates strong adversarial defense performance.
New framework, Language Guided Adversarial Purification (LGAP), utilizing pre-trained diffusion models and caption generators.
arXiv Detail & Related papers (2023-09-19T06:17:18Z) - Unsupervised Adversarial Detection without Extra Model: Training Loss
Should Change [24.76524262635603]
Traditional approaches to adversarial training and supervised detection rely on prior knowledge of attack types and access to labeled training data.
We propose new training losses to reduce useless features and the corresponding detection method without prior knowledge of adversarial attacks.
The proposed method works well in all tested attack types and the false positive rates are even better than the methods good at certain types.
arXiv Detail & Related papers (2023-08-07T01:41:21Z) - Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
We propose DiffPure that uses diffusion models for adversarial purification.
Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.