DiffDefense: Defending against Adversarial Attacks via Diffusion Models
- URL: http://arxiv.org/abs/2309.03702v1
- Date: Thu, 7 Sep 2023 13:28:36 GMT
- Title: DiffDefense: Defending against Adversarial Attacks via Diffusion Models
- Authors: Hondamunige Prasanna Silva, Lorenzo Seidenari, and Alberto Del Bimbo
- Abstract summary: The susceptibility of machine learning models to perturbations renders them vulnerable to adversarial attacks.
Our proposed method offers robustness against adversarial threats while preserving clean accuracy, speed, and plug-and-play compatibility.
- Score: 24.328384566645738
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a novel reconstruction method that leverages Diffusion
Models to protect machine learning classifiers against adversarial attacks, all
without requiring any modifications to the classifiers themselves. The
susceptibility of machine learning models to minor input perturbations renders
them vulnerable to adversarial attacks. While diffusion-based methods are
typically disregarded for adversarial defense due to their slow reverse
process, this paper demonstrates that our proposed method offers robustness
against adversarial threats while preserving clean accuracy, speed, and
plug-and-play compatibility. Code at:
https://github.com/HondamunigePrasannaSilva/DiffDefence.
Related papers
- Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models [65.30406788716104]
This work investigates the vulnerabilities of security-enhancing diffusion models.
We demonstrate that these models are highly susceptible to DIFF2, a simple yet effective backdoor attack.
Case studies show that DIFF2 can significantly reduce both post-purification and certified accuracy across benchmark datasets and models.
arXiv Detail & Related papers (2024-06-14T02:39:43Z) - Adversarial Text Purification: A Large Language Model Approach for
Defense [25.041109219049442]
Adversarial purification is a defense mechanism for safeguarding classifiers against adversarial attacks.
We propose a novel adversarial text purification that harnesses the generative capabilities of Large Language Models.
Our proposed method demonstrates remarkable performance over various classifiers, improving their accuracy under the attack by over 65% on average.
arXiv Detail & Related papers (2024-02-05T02:36:41Z) - Language Guided Adversarial Purification [3.9931474959554496]
Adversarial purification using generative models demonstrates strong adversarial defense performance.
New framework, Language Guided Adversarial Purification (LGAP), utilizing pre-trained diffusion models and caption generators.
arXiv Detail & Related papers (2023-09-19T06:17:18Z) - AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models [7.406040859734522]
Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques.
Previous attack methods often directly inject Projected Gradient Descent (PGD) gradients into the sampling of generative models.
We propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models.
arXiv Detail & Related papers (2023-07-24T03:10:02Z) - DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks [34.86098237949214]
Adversarial attacks, particularly patch attacks, pose significant threats to the robustness and reliability of deep learning models.
This paper introduces DIFFender, a novel defense framework that harnesses the capabilities of a text-guided diffusion model to combat patch attacks.
DIFFender integrates dual tasks of patch localization and restoration within a single diffusion model framework.
arXiv Detail & Related papers (2023-06-15T13:33:27Z) - Robust Classification via a Single Diffusion Model [37.46217654590878]
Robust Diffusion (RDC) is a generative classifier constructed from a pre-trained diffusion model to be adversarially robust.
RDC achieves $75.67%$ robust accuracy against various $ell_infty$ norm-bounded adaptive attacks with $epsilon_infty/255$ on CIFAR-10.
arXiv Detail & Related papers (2023-05-24T15:25:19Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
We propose DiffPure that uses diffusion models for adversarial purification.
Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.