Related papers: Splitting the Difference on Adversarial Training

Splitting the Difference on Adversarial Training

URL: http://arxiv.org/abs/2310.02480v1
Date: Tue, 3 Oct 2023 23:09:47 GMT
Title: Splitting the Difference on Adversarial Training
Authors: Matan Levi, Aryeh Kontorovich
Abstract summary: adversarial training is one of the most effective defenses against adversarial examples. In this work, we take a fundamentally different approach by treating the perturbed examples of each class as a separate class to be learned. This split doubles the number of classes to be learned, but at the same time considerably simplifies the decision boundaries.
Score: 13.470640587945057
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The existence of adversarial examples points to a basic weakness of deep neural networks. One of the most effective defenses against such examples, adversarial training, entails training models with some degree of robustness, usually at the expense of a degraded natural accuracy. Most adversarial training methods aim to learn a model that finds, for each class, a common decision boundary encompassing both the clean and perturbed examples. In this work, we take a fundamentally different approach by treating the perturbed examples of each class as a separate class to be learned, effectively splitting each class into two classes: "clean" and "adversarial." This split doubles the number of classes to be learned, but at the same time considerably simplifies the decision boundaries. We provide a theoretical plausibility argument that sheds some light on the conditions under which our approach can be expected to be beneficial. Likewise, we empirically demonstrate that our method learns robust models while attaining optimal or near-optimal natural accuracy, e.g., on CIFAR-10 we obtain near-optimal natural accuracy of $95.01\%$ alongside significant robustness across multiple tasks. The ability to achieve such near-optimal natural accuracy, while maintaining a significant level of robustness, makes our method applicable to real-world applications where natural accuracy is at a premium. As a whole, our main contribution is a general method that confers a significant level of robustness upon classifiers with only minor or negligible degradation of their natural accuracy.

Related papers

Towards Certified Probabilistic Robustness with High Accuracy [3.957941698534126]
Adrial examples pose a security threat to many critical systems built on neural networks. How to build certifiably robust yet accurate neural network models remains an open problem. We propose a novel approach that aims to achieve both high accuracy and certified probabilistic robustness.
arXiv Detail & Related papers (2023-09-02T09:39:47Z)
A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking [54.89987482509155]
robustness of deep neural networks is usually lacking under adversarial examples, common corruptions, and distribution shifts. We establish a comprehensive benchmark robustness called textbfARES-Bench on the image classification task. By designing the training settings accordingly, we achieve the new state-of-the-art adversarial robustness.
arXiv Detail & Related papers (2023-02-28T04:26:20Z)
Towards the Desirable Decision Boundary by Moderate-Margin Adversarial Training [8.904046529174867]
We propose a novel adversarial training scheme to achieve a better trade-off between robustness and natural accuracy. MMAT generates finer-grained adversarial examples to mitigate the cross-over problem. On SVHN, for example, state-of-the-art robustness and natural accuracy are achieved.
arXiv Detail & Related papers (2022-07-16T00:57:23Z)
Push Stricter to Decide Better: A Class-Conditional Feature Adaptive Framework for Improving Adversarial Robustness [18.98147977363969]
We propose a Feature Adaptive Adversarial Training (FAAT) to optimize the class-conditional feature adaption across natural data and adversarial examples. FAAT produces more discriminative features and performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2021-12-01T07:37:56Z)
Adversarial Robustness with Semi-Infinite Constrained Learning [177.42714838799924]
Deep learning to inputs perturbations has raised serious questions about its use in safety-critical domains. We propose a hybrid Langevin Monte Carlo training approach to mitigate this issue. We show that our approach can mitigate the trade-off between state-of-the-art performance and robust robustness.
arXiv Detail & Related papers (2021-10-29T13:30:42Z)
Analysis and Applications of Class-wise Robustness in Adversarial Training [92.08430396614273]
Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples. Previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing. We provide a detailed diagnosis of adversarial training on six benchmark datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet. We observe that the stronger attack methods in adversarial learning achieve performance improvement mainly from a more successful attack on the vulnerable classes.
arXiv Detail & Related papers (2021-05-29T07:28:35Z)
Constant Random Perturbations Provide Adversarial Robustness with Minimal Effect on Accuracy [41.84118016227271]
This paper proposes an attack-independent (non-adversarial training) technique for improving adversarial robustness of neural network models. We suggest creating a neighborhood around each training example, such that the label is kept constant for all inputs within that neighborhood. Results suggest that the proposed approach improves standard accuracy over other defenses while having increased robustness compared to vanilla adversarial training.
arXiv Detail & Related papers (2021-03-15T10:44:59Z)
Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples. We propose a new framework called SPROUT, self-progressing robust training. Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z)
Robustness May Be at Odds with Fairness: An Empirical Study on Class-wise Accuracy [85.20742045853738]
CNNs are widely known to be vulnerable to adversarial attacks. We propose an empirical study on the class-wise accuracy and robustness of adversarially trained models. We find that there exists inter-class discrepancy for accuracy and robustness even when the training dataset has an equal number of samples for each class.
arXiv Detail & Related papers (2020-10-26T06:32:32Z)
Revisiting Ensembles in an Adversarial Context: Improving Natural Accuracy [5.482532589225552]
There is still a significant gap in natural accuracy between robust and non-robust models. We consider a number of ensemble methods designed to mitigate this performance difference. We consider two schemes, one that combines predictions from several randomly robust models, and the other that fuses features from robust and standard models.
arXiv Detail & Related papers (2020-02-26T15:45:58Z)
Precise Tradeoffs in Adversarial Training for Linear Regression [55.764306209771405]
We provide a precise and comprehensive understanding of the role of adversarial training in the context of linear regression with Gaussian features. We precisely characterize the standard/robust accuracy and the corresponding tradeoff achieved by a contemporary mini-max adversarial training approach. Our theory for adversarial training algorithms also facilitates the rigorous study of how a variety of factors (size and quality of training data, model overparametrization etc.) affect the tradeoff between these two competing accuracies.
arXiv Detail & Related papers (2020-02-24T19:01:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.