Raising the Bar for Certified Adversarial Robustness with Diffusion
Models
- URL: http://arxiv.org/abs/2305.10388v1
- Date: Wed, 17 May 2023 17:29:10 GMT
- Title: Raising the Bar for Certified Adversarial Robustness with Diffusion
Models
- Authors: Thomas Altstidl, David Dobre, Bj\"orn Eskofier, Gauthier Gidel, Leo
Schwinn
- Abstract summary: In this work, we demonstrate that a similar approach can substantially improve deterministic certified defenses.
One of our main insights is that the difference between the training and test accuracy of the original model, is a good predictor of the magnitude of the improvement.
Our approach achieves state-of-the-art deterministic robustness certificates on CIFAR-10 for the $ell$ ($epsilon = 36/255$) and $ell_infty$ ($epsilon = 8/255$) threat models.
- Score: 9.684141378657522
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Certified defenses against adversarial attacks offer formal guarantees on the
robustness of a model, making them more reliable than empirical methods such as
adversarial training, whose effectiveness is often later reduced by unseen
attacks. Still, the limited certified robustness that is currently achievable
has been a bottleneck for their practical adoption. Gowal et al. and Wang et
al. have shown that generating additional training data using state-of-the-art
diffusion models can considerably improve the robustness of adversarial
training. In this work, we demonstrate that a similar approach can
substantially improve deterministic certified defenses. In addition, we provide
a list of recommendations to scale the robustness of certified training
approaches. One of our main insights is that the generalization gap, i.e., the
difference between the training and test accuracy of the original model, is a
good predictor of the magnitude of the robustness improvement when using
additional generated data. Our approach achieves state-of-the-art deterministic
robustness certificates on CIFAR-10 for the $\ell_2$ ($\epsilon = 36/255$) and
$\ell_\infty$ ($\epsilon = 8/255$) threat models, outperforming the previous
best results by $+3.95\%$ and $+1.39\%$, respectively. Furthermore, we report
similar improvements for CIFAR-100.
Related papers
- Adversarial Feature Alignment: Balancing Robustness and Accuracy in Deep
Learning via Adversarial Training [10.099179580467737]
Adversarial training is used to mitigate this problem by increasing robustness against adversarial attacks.
This approach typically reduces a model's standard accuracy on clean, non-adversarial samples.
This paper proposes a novel adversarial training method called Adversarial Feature Alignment (AFA) to address these problems.
arXiv Detail & Related papers (2024-02-19T14:51:20Z) - Perturbation-Invariant Adversarial Training for Neural Ranking Models:
Improving the Effectiveness-Robustness Trade-Off [107.35833747750446]
adversarial examples can be crafted by adding imperceptible perturbations to legitimate documents.
This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs.
In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs.
arXiv Detail & Related papers (2023-12-16T05:38:39Z) - Learn from the Past: A Proxy Guided Adversarial Defense Framework with
Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models.
AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting.
We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z) - Towards Certified Probabilistic Robustness with High Accuracy [3.957941698534126]
Adrial examples pose a security threat to many critical systems built on neural networks.
How to build certifiably robust yet accurate neural network models remains an open problem.
We propose a novel approach that aims to achieve both high accuracy and certified probabilistic robustness.
arXiv Detail & Related papers (2023-09-02T09:39:47Z) - Vanilla Feature Distillation for Improving the Accuracy-Robustness
Trade-Off in Adversarial Training [37.5115141623558]
We propose a Vanilla Feature Distillation Adversarial Training (VFD-Adv) to guide adversarial training towards higher accuracy.
A key advantage of our method is that it can be universally adapted to and boost existing works.
arXiv Detail & Related papers (2022-06-05T11:57:10Z) - Adversarial Training with Rectified Rejection [114.83821848791206]
We propose to use true confidence (T-Con) as a certainty oracle, and learn to predict T-Con by rectifying confidence.
We prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones.
arXiv Detail & Related papers (2021-05-31T08:24:53Z) - Analysis and Applications of Class-wise Robustness in Adversarial
Training [92.08430396614273]
Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples.
Previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing.
We provide a detailed diagnosis of adversarial training on six benchmark datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet.
We observe that the stronger attack methods in adversarial learning achieve performance improvement mainly from a more successful attack on the vulnerable classes.
arXiv Detail & Related papers (2021-05-29T07:28:35Z) - Adversarial Feature Stacking for Accurate and Robust Predictions [4.208059346198116]
Adversarial Feature Stacking (AFS) model can jointly take advantage of features with varied levels of robustness and accuracy.
We evaluate the AFS model on CIFAR-10 and CIFAR-100 datasets with strong adaptive attack methods.
arXiv Detail & Related papers (2021-03-24T12:01:24Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z) - Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning [134.15174177472807]
We introduce adversarial training into self-supervision, to provide general-purpose robust pre-trained models for the first time.
We conduct extensive experiments to demonstrate that the proposed framework achieves large performance margins.
arXiv Detail & Related papers (2020-03-28T18:28:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.