Adversarial Purification through Representation Disentanglement
- URL: http://arxiv.org/abs/2110.07801v1
- Date: Fri, 15 Oct 2021 01:45:31 GMT
- Title: Adversarial Purification through Representation Disentanglement
- Authors: Tao Bai, Jun Zhao, Lanqing Guo, Bihan Wen
- Abstract summary: Deep learning models are vulnerable to adversarial examples and make incomprehensible mistakes.
Current defense methods, especially purification, tend to remove noise" by learning and recovering the natural images.
In this work, we propose a novel adversarial purification scheme by presenting disentanglement of natural images and adversarial perturbations as a preprocessing defense.
- Score: 21.862799765511976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models are vulnerable to adversarial examples and make
incomprehensible mistakes, which puts a threat on their real-world deployment.
Combined with the idea of adversarial training, preprocessing-based defenses
are popular and convenient to use because of their task independence and good
generalizability. Current defense methods, especially purification, tend to
remove ``noise" by learning and recovering the natural images. However,
different from random noise, the adversarial patterns are much easier to be
overfitted during model training due to their strong correlation to the images.
In this work, we propose a novel adversarial purification scheme by presenting
disentanglement of natural images and adversarial perturbations as a
preprocessing defense. With extensive experiments, our defense is shown to be
generalizable and make significant protection against unseen strong adversarial
attacks. It reduces the success rates of state-of-the-art \textbf{ensemble}
attacks from \textbf{61.7\%} to \textbf{14.9\%} on average, superior to a
number of existing methods. Notably, our defense restores the perturbed images
perfectly and does not hurt the clean accuracy of backbone models, which is
highly desirable in practice.
Related papers
- High-Frequency Anti-DreamBooth: Robust Defense against Personalized Image Synthesis [12.555117983678624]
We propose a new adversarial attack method that adds strong perturbation on the high-frequency areas of images to make it more robust to adversarial purification.
Our experiment showed that the adversarial images retained noise even after adversarial purification, hindering malicious image generation.
arXiv Detail & Related papers (2024-09-12T15:58:28Z) - Rethinking and Defending Protective Perturbation in Personalized Diffusion Models [21.30373461975769]
We study the fine-tuning process of personalized diffusion models (PDMs) through the lens of shortcut learning.
PDMs are susceptible to minor adversarial perturbations, leading to significant degradation when fine-tuned on corrupted datasets.
We propose a systematic defense framework that includes data purification and contrastive decoupling learning.
arXiv Detail & Related papers (2024-06-27T07:14:14Z) - Improving Adversarial Robustness via Decoupled Visual Representation Masking [65.73203518658224]
In this paper, we highlight two novel properties of robust features from the feature distribution perspective.
We find that state-of-the-art defense methods aim to address both of these mentioned issues well.
Specifically, we propose a simple but effective defense based on decoupled visual representation masking.
arXiv Detail & Related papers (2024-06-16T13:29:41Z) - F$^2$AT: Feature-Focusing Adversarial Training via Disentanglement of
Natural and Perturbed Patterns [74.03108122774098]
Deep neural networks (DNNs) are vulnerable to adversarial examples crafted by well-designed perturbations.
This could lead to disastrous results on critical applications such as self-driving cars, surveillance security, and medical diagnosis.
We propose a Feature-Focusing Adversarial Training (F$2$AT) which enforces the model to focus on the core features from natural patterns.
arXiv Detail & Related papers (2023-10-23T04:31:42Z) - IRAD: Implicit Representation-driven Image Resampling against Adversarial Attacks [16.577595936609665]
We introduce a novel approach to counter adversarial attacks, namely, image resampling.
Image resampling transforms a discrete image into a new one, simulating the process of scene recapturing or rerendering as specified by a geometrical transformation.
We show that our method significantly enhances the adversarial robustness of diverse deep models against various attacks while maintaining high accuracy on clean images.
arXiv Detail & Related papers (2023-10-18T11:19:32Z) - Content-based Unrestricted Adversarial Attack [53.181920529225906]
We propose a novel unrestricted attack framework called Content-based Unrestricted Adversarial Attack.
By leveraging a low-dimensional manifold that represents natural images, we map the images onto the manifold and optimize them along its adversarial direction.
arXiv Detail & Related papers (2023-05-18T02:57:43Z) - Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
We propose DiffPure that uses diffusion models for adversarial purification.
Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z) - ARIA: Adversarially Robust Image Attribution for Content Provenance [25.217001579437635]
We show how to generate valid adversarial images that can easily cause incorrect image attribution.
We then describe an approach to prevent imperceptible adversarial attacks on deep visual fingerprinting models.
The resulting models are substantially more robust, are accurate even on unperturbed images, and perform well even over a database with millions of images.
arXiv Detail & Related papers (2022-02-25T18:11:45Z) - Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model.
We propose to exploit additional information from the feature space to craft stronger adversaries.
Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z) - Towards Achieving Adversarial Robustness by Enforcing Feature
Consistency Across Bit Planes [51.31334977346847]
We train networks to form coarse impressions based on the information in higher bit planes, and use the lower bit planes only to refine their prediction.
We demonstrate that, by imposing consistency on the representations learned across differently quantized images, the adversarial robustness of networks improves significantly.
arXiv Detail & Related papers (2020-04-01T09:31:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.