Stratified Adversarial Robustness with Rejection
- URL: http://arxiv.org/abs/2305.01139v2
- Date: Fri, 12 May 2023 01:00:57 GMT
- Title: Stratified Adversarial Robustness with Rejection
- Authors: Jiefeng Chen, Jayaram Raghuram, Jihye Choi, Xi Wu, Yingyu Liang,
Somesh Jha
- Abstract summary: We study adversarially-robust classification with rejection in the stratified rejection setting.
We propose a novel defense method -- Adversarial Training with Consistent Prediction-based Rejection (CPR)
CPR significantly outperforms existing methods under strong adaptive attacks.
- Score: 33.72077702550626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, there is an emerging interest in adversarially training a
classifier with a rejection option (also known as a selective classifier) for
boosting adversarial robustness. While rejection can incur a cost in many
applications, existing studies typically associate zero cost with rejecting
perturbed inputs, which can result in the rejection of numerous
slightly-perturbed inputs that could be correctly classified. In this work, we
study adversarially-robust classification with rejection in the stratified
rejection setting, where the rejection cost is modeled by rejection loss
functions monotonically non-increasing in the perturbation magnitude. We
theoretically analyze the stratified rejection setting and propose a novel
defense method -- Adversarial Training with Consistent Prediction-based
Rejection (CPR) -- for building a robust selective classifier. Experiments on
image datasets demonstrate that the proposed method significantly outperforms
existing methods under strong adaptive attacks. For instance, on CIFAR-10, CPR
reduces the total robust loss (for different rejection losses) by at least 7.3%
under both seen and unseen attacks.
Related papers
- Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization [60.176008034221404]
Direct Preference Optimization (DPO) and its variants are increasingly used for aligning language models with human preferences.
Prior work has observed that the likelihood of preferred responses often decreases during training.
We demonstrate that likelihood displacement can be catastrophic, shifting probability mass from preferred responses to responses with an opposite meaning.
arXiv Detail & Related papers (2024-10-11T14:22:44Z) - Classifier Guidance Enhances Diffusion-based Adversarial Purification by Preserving Predictive Information [75.36597470578724]
Adversarial purification is one of the promising approaches to defend neural networks against adversarial attacks.
We propose gUided Purification (COUP) algorithm, which purifies while keeping away from the classifier decision boundary.
Experimental results show that COUP can achieve better adversarial robustness under strong attack methods.
arXiv Detail & Related papers (2024-08-12T02:48:00Z) - Regression with Cost-based Rejection [30.43900105405108]
We investigate a novel regression problem where the model can reject to make predictions on some examples given certain rejection costs.
We derive the Bayes optimal solution, which shows that the optimal model should reject to make predictions on the examples whose variance is larger than the rejection cost.
arXiv Detail & Related papers (2023-11-08T09:33:21Z) - Confidence-aware Training of Smoothed Classifiers for Certified
Robustness [75.95332266383417]
We use "accuracy under Gaussian noise" as an easy-to-compute proxy of adversarial robustness for an input.
Our experiments show that the proposed method consistently exhibits improved certified robustness upon state-of-the-art training methods.
arXiv Detail & Related papers (2022-12-18T03:57:12Z) - Optimal Rejection Function Meets Character Recognition Tasks [8.373151777137792]
We propose an optimal rejection method for rejecting ambiguous samples by a rejection function.
This rejection function is trained together with a classification function under the framework of Learning-with-Rejection (LwR)
Our extensive experiments of notMNIST classification and character/non-character classification demonstrate that the proposed method achieves better performance than traditional rejection strategies.
arXiv Detail & Related papers (2022-03-17T08:14:00Z) - Adversarial Training with Rectified Rejection [114.83821848791206]
We propose to use true confidence (T-Con) as a certainty oracle, and learn to predict T-Con by rectifying confidence.
We prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones.
arXiv Detail & Related papers (2021-05-31T08:24:53Z) - Selective Probabilistic Classifier Based on Hypothesis Testing [14.695979686066066]
We propose a simple yet effective method to deal with the violation of the Closed-World Assumption for a classifier.
The proposed method is a rejection option based on hypothesis testing with probabilistic networks.
It is shown that the proposed method can achieve a broader range of operation and cover a lower False Positive Ratio than the alternative.
arXiv Detail & Related papers (2021-05-09T08:55:56Z) - ATRO: Adversarial Training with a Rejection Option [10.36668157679368]
This paper proposes a classification framework with a rejection option to mitigate the performance deterioration caused by adversarial examples.
Applying the adversarial training objective to both a classifier and a rejection function simultaneously, we can choose to abstain from classification when it has insufficient confidence to classify a test data point.
arXiv Detail & Related papers (2020-10-24T14:05:03Z) - Classification with Rejection Based on Cost-sensitive Classification [83.50402803131412]
We propose a novel method of classification with rejection by ensemble of learning.
Experimental results demonstrate the usefulness of our proposed approach in clean, noisy, and positive-unlabeled classification.
arXiv Detail & Related papers (2020-10-22T14:05:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.