Splitting the Difference on Adversarial Training
- URL: http://arxiv.org/abs/2310.02480v1
- Date: Tue, 3 Oct 2023 23:09:47 GMT
- Title: Splitting the Difference on Adversarial Training
- Authors: Matan Levi, Aryeh Kontorovich
- Abstract summary: adversarial training is one of the most effective defenses against adversarial examples.
In this work, we take a fundamentally different approach by treating the perturbed examples of each class as a separate class to be learned.
This split doubles the number of classes to be learned, but at the same time considerably simplifies the decision boundaries.
- Score: 13.470640587945057
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The existence of adversarial examples points to a basic weakness of deep
neural networks. One of the most effective defenses against such examples,
adversarial training, entails training models with some degree of robustness,
usually at the expense of a degraded natural accuracy. Most adversarial
training methods aim to learn a model that finds, for each class, a common
decision boundary encompassing both the clean and perturbed examples. In this
work, we take a fundamentally different approach by treating the perturbed
examples of each class as a separate class to be learned, effectively splitting
each class into two classes: "clean" and "adversarial." This split doubles the
number of classes to be learned, but at the same time considerably simplifies
the decision boundaries. We provide a theoretical plausibility argument that
sheds some light on the conditions under which our approach can be expected to
be beneficial. Likewise, we empirically demonstrate that our method learns
robust models while attaining optimal or near-optimal natural accuracy, e.g.,
on CIFAR-10 we obtain near-optimal natural accuracy of $95.01\%$ alongside
significant robustness across multiple tasks. The ability to achieve such
near-optimal natural accuracy, while maintaining a significant level of
robustness, makes our method applicable to real-world applications where
natural accuracy is at a premium. As a whole, our main contribution is a
general method that confers a significant level of robustness upon classifiers
with only minor or negligible degradation of their natural accuracy.
Related papers
- Towards Certified Probabilistic Robustness with High Accuracy [3.957941698534126]
Adrial examples pose a security threat to many critical systems built on neural networks.
How to build certifiably robust yet accurate neural network models remains an open problem.
We propose a novel approach that aims to achieve both high accuracy and certified probabilistic robustness.
arXiv Detail & Related papers (2023-09-02T09:39:47Z) - A Comprehensive Study on Robustness of Image Classification Models:
Benchmarking and Rethinking [54.89987482509155]
robustness of deep neural networks is usually lacking under adversarial examples, common corruptions, and distribution shifts.
We establish a comprehensive benchmark robustness called textbfARES-Bench on the image classification task.
By designing the training settings accordingly, we achieve the new state-of-the-art adversarial robustness.
arXiv Detail & Related papers (2023-02-28T04:26:20Z) - Towards the Desirable Decision Boundary by Moderate-Margin Adversarial
Training [8.904046529174867]
We propose a novel adversarial training scheme to achieve a better trade-off between robustness and natural accuracy.
MMAT generates finer-grained adversarial examples to mitigate the cross-over problem.
On SVHN, for example, state-of-the-art robustness and natural accuracy are achieved.
arXiv Detail & Related papers (2022-07-16T00:57:23Z) - Push Stricter to Decide Better: A Class-Conditional Feature Adaptive
Framework for Improving Adversarial Robustness [18.98147977363969]
We propose a Feature Adaptive Adversarial Training (FAAT) to optimize the class-conditional feature adaption across natural data and adversarial examples.
FAAT produces more discriminative features and performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2021-12-01T07:37:56Z) - Adversarial Robustness with Semi-Infinite Constrained Learning [177.42714838799924]
Deep learning to inputs perturbations has raised serious questions about its use in safety-critical domains.
We propose a hybrid Langevin Monte Carlo training approach to mitigate this issue.
We show that our approach can mitigate the trade-off between state-of-the-art performance and robust robustness.
arXiv Detail & Related papers (2021-10-29T13:30:42Z) - Analysis and Applications of Class-wise Robustness in Adversarial
Training [92.08430396614273]
Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples.
Previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing.
We provide a detailed diagnosis of adversarial training on six benchmark datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet.
We observe that the stronger attack methods in adversarial learning achieve performance improvement mainly from a more successful attack on the vulnerable classes.
arXiv Detail & Related papers (2021-05-29T07:28:35Z) - Constant Random Perturbations Provide Adversarial Robustness with
Minimal Effect on Accuracy [41.84118016227271]
This paper proposes an attack-independent (non-adversarial training) technique for improving adversarial robustness of neural network models.
We suggest creating a neighborhood around each training example, such that the label is kept constant for all inputs within that neighborhood.
Results suggest that the proposed approach improves standard accuracy over other defenses while having increased robustness compared to vanilla adversarial training.
arXiv Detail & Related papers (2021-03-15T10:44:59Z) - Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples.
We propose a new framework called SPROUT, self-progressing robust training.
Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z) - Robustness May Be at Odds with Fairness: An Empirical Study on
Class-wise Accuracy [85.20742045853738]
CNNs are widely known to be vulnerable to adversarial attacks.
We propose an empirical study on the class-wise accuracy and robustness of adversarially trained models.
We find that there exists inter-class discrepancy for accuracy and robustness even when the training dataset has an equal number of samples for each class.
arXiv Detail & Related papers (2020-10-26T06:32:32Z) - Revisiting Ensembles in an Adversarial Context: Improving Natural
Accuracy [5.482532589225552]
There is still a significant gap in natural accuracy between robust and non-robust models.
We consider a number of ensemble methods designed to mitigate this performance difference.
We consider two schemes, one that combines predictions from several randomly robust models, and the other that fuses features from robust and standard models.
arXiv Detail & Related papers (2020-02-26T15:45:58Z) - Precise Tradeoffs in Adversarial Training for Linear Regression [55.764306209771405]
We provide a precise and comprehensive understanding of the role of adversarial training in the context of linear regression with Gaussian features.
We precisely characterize the standard/robust accuracy and the corresponding tradeoff achieved by a contemporary mini-max adversarial training approach.
Our theory for adversarial training algorithms also facilitates the rigorous study of how a variety of factors (size and quality of training data, model overparametrization etc.) affect the tradeoff between these two competing accuracies.
arXiv Detail & Related papers (2020-02-24T19:01:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.