Adversarial Robustness on In- and Out-Distribution Improves
Explainability
- URL: http://arxiv.org/abs/2003.09461v2
- Date: Wed, 29 Jul 2020 17:36:00 GMT
- Title: Adversarial Robustness on In- and Out-Distribution Improves
Explainability
- Authors: Maximilian Augustin, Alexander Meinke, Matthias Hein
- Abstract summary: RATIO is a training procedure for robustness via Adversarial Training on In- and Out-distribution.
RATIO achieves state-of-the-art $l$-adrial on CIFAR10 and maintains better clean accuracy.
- Score: 109.68938066821246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks have led to major improvements in image classification but
suffer from being non-robust to adversarial changes, unreliable uncertainty
estimates on out-distribution samples and their inscrutable black-box
decisions. In this work we propose RATIO, a training procedure for Robustness
via Adversarial Training on In- and Out-distribution, which leads to robust
models with reliable and robust confidence estimates on the out-distribution.
RATIO has similar generative properties to adversarial training so that visual
counterfactuals produce class specific features. While adversarial training
comes at the price of lower clean accuracy, RATIO achieves state-of-the-art
$l_2$-adversarial robustness on CIFAR10 and maintains better clean accuracy.
Related papers
- MixedNUTS: Training-Free Accuracy-Robustness Balance via Nonlinearly Mixed Classifiers [41.56951365163419]
"MixedNUTS" is a training-free method where the output logits of a robust classifier are processed by nonlinear transformations with only three parameters.
MixedNUTS then converts the transformed logits into probabilities and mixes them as the overall output.
On CIFAR-10, CIFAR-100, and ImageNet datasets, experimental results with custom strong adaptive attacks demonstrate MixedNUTS's vastly improved accuracy and near-SOTA robustness.
arXiv Detail & Related papers (2024-02-03T21:12:36Z) - Perturbation-Invariant Adversarial Training for Neural Ranking Models:
Improving the Effectiveness-Robustness Trade-Off [107.35833747750446]
adversarial examples can be crafted by adding imperceptible perturbations to legitimate documents.
This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs.
In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs.
arXiv Detail & Related papers (2023-12-16T05:38:39Z) - Mitigating Accuracy-Robustness Trade-off via Balanced Multi-Teacher Adversarial Distillation [12.39860047886679]
Adversarial Training is a practical approach for improving the robustness of deep neural networks against adversarial attacks.
We introduce Balanced Multi-Teacher Adversarial Robustness Distillation (B-MTARD) to guide the model's Adversarial Training process.
B-MTARD outperforms the state-of-the-art methods against various adversarial attacks.
arXiv Detail & Related papers (2023-06-28T12:47:01Z) - Augmentation by Counterfactual Explanation -- Fixing an Overconfident
Classifier [11.233334009240947]
A highly accurate but overconfident model is ill-suited for deployment in critical applications such as healthcare and autonomous driving.
This paper proposes an application of counterfactual explanations in fixing an over-confident classifier.
arXiv Detail & Related papers (2022-10-21T18:53:16Z) - Vanilla Feature Distillation for Improving the Accuracy-Robustness
Trade-Off in Adversarial Training [37.5115141623558]
We propose a Vanilla Feature Distillation Adversarial Training (VFD-Adv) to guide adversarial training towards higher accuracy.
A key advantage of our method is that it can be universally adapted to and boost existing works.
arXiv Detail & Related papers (2022-06-05T11:57:10Z) - Robustness through Cognitive Dissociation Mitigation in Contrastive
Adversarial Training [2.538209532048867]
We introduce a novel neural network training framework that increases model's adversarial robustness to adversarial attacks.
We propose to improve model robustness to adversarial attacks by learning feature representations consistent under both data augmentations and adversarial perturbations.
We validate our method on the CIFAR-10 dataset on which it outperforms both robust accuracy and clean accuracy over alternative supervised and self-supervised adversarial learning methods.
arXiv Detail & Related papers (2022-03-16T21:41:27Z) - Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature.
We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance.
By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z) - Adversarial Training with Rectified Rejection [114.83821848791206]
We propose to use true confidence (T-Con) as a certainty oracle, and learn to predict T-Con by rectifying confidence.
We prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones.
arXiv Detail & Related papers (2021-05-31T08:24:53Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z) - Learning Calibrated Uncertainties for Domain Shift: A Distributionally
Robust Learning Approach [150.8920602230832]
We propose a framework for learning calibrated uncertainties under domain shifts.
In particular, the density ratio estimation reflects the closeness of a target (test) sample to the source (training) distribution.
We show that our proposed method generates calibrated uncertainties that benefit downstream tasks.
arXiv Detail & Related papers (2020-10-08T02:10:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.