(Certified!!) Adversarial Robustness for Free!
- URL: http://arxiv.org/abs/2206.10550v1
- Date: Tue, 21 Jun 2022 17:27:27 GMT
- Title: (Certified!!) Adversarial Robustness for Free!
- Authors: Nicholas Carlini, Florian Tramer, Krishnamurthy (Dj) Dvijotham, J.
Zico Kolter
- Abstract summary: We certify 71% accuracy on ImageNet under adversarial perturbations constrained to be within a 2-norm of 0.5.
We obtain these results using only pretrained diffusion models and image classifiers, without requiring any fine tuning or retraining of model parameters.
- Score: 116.6052628829344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we show how to achieve state-of-the-art certified adversarial
robustness to 2-norm bounded perturbations by relying exclusively on
off-the-shelf pretrained models. To do so, we instantiate the denoised
smoothing approach of Salman et al. by combining a pretrained denoising
diffusion probabilistic model and a standard high-accuracy classifier. This
allows us to certify 71% accuracy on ImageNet under adversarial perturbations
constrained to be within a 2-norm of 0.5, an improvement of 14 percentage
points over the prior certified SoTA using any approach, or an improvement of
30 percentage points over denoised smoothing. We obtain these results using
only pretrained diffusion models and image classifiers, without requiring any
fine tuning or retraining of model parameters.
Related papers
- Confidence-aware Denoised Fine-tuning of Off-the-shelf Models for Certified Robustness [56.2479170374811]
We introduce Fine-Tuning with Confidence-Aware Denoised Image Selection (FT-CADIS)
FT-CADIS is inspired by the observation that the confidence of off-the-shelf classifiers can effectively identify hallucinated images during denoised smoothing.
It has established the state-of-the-art certified robustness among denoised smoothing methods across all $ell$-adversary radius in various benchmarks.
arXiv Detail & Related papers (2024-11-13T09:13:20Z) - Certified PEFTSmoothing: Parameter-Efficient Fine-Tuning with Randomized Smoothing [6.86204821852287]
Randomized smoothing is the primary certified robustness method for accessing the robustness of deep learning models to adversarial perturbations in the l2-norm.
A notable constraint limiting widespread adoption is the necessity to retrain base models entirely from scratch to attain a robust version.
This is because the base model fails to learn the noise-augmented data distribution to give an accurate vote.
Inspired by recent large model training procedures, we explore an alternative way named PEFTSmoothing to adapt the base model to learn the noise-augmented data.
arXiv Detail & Related papers (2024-04-08T09:38:22Z) - Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness [52.9493817508055]
We propose Pre-trained Model Guided Adversarial Fine-Tuning (PMG-AFT) to enhance the model's zero-shot adversarial robustness.
Our approach consistently improves clean accuracy by an average of 8.72%.
arXiv Detail & Related papers (2024-01-09T04:33:03Z) - Improving the Accuracy-Robustness Trade-Off of Classifiers via Adaptive Smoothing [9.637143119088426]
We show that a robust base classifier's confidence difference for correct and incorrect examples is the key to this improvement.
We adapt an adversarial input detector into a mixing network that adaptively adjusts the mixture of the two base models.
The proposed flexible method, termed "adaptive smoothing", can work in conjunction with existing or even future methods that improve clean accuracy, robustness, or adversary detection.
arXiv Detail & Related papers (2023-01-29T22:05:28Z) - Confidence-aware Training of Smoothed Classifiers for Certified
Robustness [75.95332266383417]
We use "accuracy under Gaussian noise" as an easy-to-compute proxy of adversarial robustness for an input.
Our experiments show that the proposed method consistently exhibits improved certified robustness upon state-of-the-art training methods.
arXiv Detail & Related papers (2022-12-18T03:57:12Z) - Guided Diffusion Model for Adversarial Purification from Random Noise [0.0]
We propose a novel guided diffusion purification approach to provide a strong defense against adversarial attacks.
Our model achieves 89.62% robust accuracy under PGD-L_inf attack (eps = 8/255) on the CIFAR-10 dataset.
arXiv Detail & Related papers (2022-06-22T06:55:03Z) - SmoothMix: Training Confidence-calibrated Smoothed Classifiers for
Certified Robustness [61.212486108346695]
We propose a training scheme, coined SmoothMix, to control the robustness of smoothed classifiers via self-mixup.
The proposed procedure effectively identifies over-confident, near off-class samples as a cause of limited robustness.
Our experimental results demonstrate that the proposed method can significantly improve the certified $ell$-robustness of smoothed classifiers.
arXiv Detail & Related papers (2021-11-17T18:20:59Z) - Consistency Regularization for Certified Robustness of Smoothed
Classifiers [89.72878906950208]
A recent technique of randomized smoothing has shown that the worst-case $ell$-robustness can be transformed into the average-case robustness.
We found that the trade-off between accuracy and certified robustness of smoothed classifiers can be greatly controlled by simply regularizing the prediction consistency over noise.
arXiv Detail & Related papers (2020-06-07T06:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.