Towards Adversarial Purification using Denoising AutoEncoders
- URL: http://arxiv.org/abs/2208.13838v1
- Date: Mon, 29 Aug 2022 19:04:25 GMT
- Title: Towards Adversarial Purification using Denoising AutoEncoders
- Authors: Dvij Kalaria, Aritra Hazra and Partha Pratim Chakrabarti
- Abstract summary: Adversarial attacks are often obtained by making subtle perturbations to normal images, which are mostly imperceptible to humans.
We propose a framework, named APuDAE, leveraging Denoising AutoEncoders (DAEs) to purify these samples by using them in an adaptive way.
We show how our framework provides comparable and in most cases better performance to the baseline methods in purifying adversaries.
- Score: 0.8701566919381223
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rapid advancement and increased use of deep learning models in image
identification, security becomes a major concern to their deployment in
safety-critical systems. Since the accuracy and robustness of deep learning
models are primarily attributed from the purity of the training samples,
therefore the deep learning architectures are often susceptible to adversarial
attacks. Adversarial attacks are often obtained by making subtle perturbations
to normal images, which are mostly imperceptible to humans, but can seriously
confuse the state-of-the-art machine learning models. We propose a framework,
named APuDAE, leveraging Denoising AutoEncoders (DAEs) to purify these samples
by using them in an adaptive way and thus improve the classification accuracy
of the target classifier networks that have been attacked. We also show how
using DAEs adaptively instead of using them directly, improves classification
accuracy further and is more robust to the possibility of designing adaptive
attacks to fool them. We demonstrate our results over MNIST, CIFAR-10, ImageNet
dataset and show how our framework (APuDAE) provides comparable and in most
cases better performance to the baseline methods in purifying adversaries. We
also design adaptive attack specifically designed to attack our purifying model
and demonstrate how our defense is robust to that.
Related papers
- Undermining Image and Text Classification Algorithms Using Adversarial Attacks [0.0]
Our study addresses the gap by training various machine learning models and using GANs and SMOTE to generate additional data points aimed at attacking text classification models.
Our experiments reveal a significant vulnerability in classification models. Specifically, we observe a 20 % decrease in accuracy for the top-performing text classification models post-attack, along with a 30 % decrease in facial recognition accuracy.
arXiv Detail & Related papers (2024-11-03T18:44:28Z) - MOREL: Enhancing Adversarial Robustness through Multi-Objective Representation Learning [1.534667887016089]
deep neural networks (DNNs) are vulnerable to slight adversarial perturbations.
We show that strong feature representation learning during training can significantly enhance the original model's robustness.
We propose MOREL, a multi-objective feature representation learning approach, encouraging classification models to produce similar features for inputs within the same class, despite perturbations.
arXiv Detail & Related papers (2024-10-02T16:05:03Z) - Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - FACTUAL: A Novel Framework for Contrastive Learning Based Robust SAR Image Classification [10.911464455072391]
FACTUAL is a Contrastive Learning framework for Adversarial Training and robust SAR classification.
Our model achieves 99.7% accuracy on clean samples, and 89.6% on perturbed samples, both outperforming previous state-of-the-art methods.
arXiv Detail & Related papers (2024-04-04T06:20:22Z) - Learn from the Past: A Proxy Guided Adversarial Defense Framework with
Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models.
AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting.
We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Learning from Attacks: Attacking Variational Autoencoder for Improving
Image Classification [17.881134865491063]
Adversarial attacks are often considered as threats to the robustness of Deep Neural Networks (DNNs)
This work analyzes adversarial attacks from a different perspective. Namely, adversarial examples contain implicit information that is useful to the predictions.
We propose an algorithmic framework that leverages the advantages of the DNNs for data self-expression and task-specific predictions.
arXiv Detail & Related papers (2022-03-11T08:48:26Z) - Detecting Adversaries, yet Faltering to Noise? Leveraging Conditional
Variational AutoEncoders for Adversary Detection in the Presence of Noisy
Images [0.7734726150561086]
Conditional Variational AutoEncoders (CVAE) are surprisingly good at detecting imperceptible image perturbations.
We show how CVAEs can be effectively used to detect adversarial attacks on image classification networks.
arXiv Detail & Related papers (2021-11-28T20:36:27Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Combating Adversaries with Anti-Adversaries [118.70141983415445]
In particular, our layer generates an input perturbation in the opposite direction of the adversarial one.
We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models.
Our anti-adversary layer significantly enhances model robustness while coming at no cost on clean accuracy.
arXiv Detail & Related papers (2021-03-26T09:36:59Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.