Decoder-free Robustness Disentanglement without (Additional) Supervision
- URL: http://arxiv.org/abs/2007.01356v1
- Date: Thu, 2 Jul 2020 19:51:40 GMT
- Title: Decoder-free Robustness Disentanglement without (Additional) Supervision
- Authors: Yifei Wang, Dan Peng, Furui Liu, Zhenguo Li, Zhitang Chen, Jiansheng
Yang
- Abstract summary: Our proposed Adversarial Asymmetric Training (AAT) algorithm can reliably disentangle robust and non-robust representations without additional supervision on robustness.
Empirical results show our method does not only successfully preserve accuracy by combining two representations, but also achieve much better disentanglement than previous work.
- Score: 42.066771710455754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial Training (AT) is proposed to alleviate the adversarial
vulnerability of machine learning models by extracting only robust features
from the input, which, however, inevitably leads to severe accuracy reduction
as it discards the non-robust yet useful features. This motivates us to
preserve both robust and non-robust features and separate them with
disentangled representation learning. Our proposed Adversarial Asymmetric
Training (AAT) algorithm can reliably disentangle robust and non-robust
representations without additional supervision on robustness. Empirical results
show our method does not only successfully preserve accuracy by combining two
representations, but also achieve much better disentanglement than previous
work.
Related papers
- Enhancing Robust Representation in Adversarial Training: Alignment and
Exclusion Criteria [61.048842737581865]
We show that Adversarial Training (AT) omits to learning robust features, resulting in poor performance of adversarial robustness.
We propose a generic framework of AT to gain robust representation, by the asymmetric negative contrast and reverse attention.
Empirical evaluations on three benchmark datasets show our methods greatly advance the robustness of AT and achieve state-of-the-art performance.
arXiv Detail & Related papers (2023-10-05T07:29:29Z) - Annealing Self-Distillation Rectification Improves Adversarial Training [0.10241134756773226]
We analyze the characteristics of robust models and identify that robust models tend to produce smoother and well-calibrated outputs.
We propose Annealing Self-Distillation Rectification, which generates soft labels as a better guidance mechanism.
We demonstrate the efficacy of ADR through extensive experiments and strong performances across datasets.
arXiv Detail & Related papers (2023-05-20T06:35:43Z) - Confidence-aware Training of Smoothed Classifiers for Certified
Robustness [75.95332266383417]
We use "accuracy under Gaussian noise" as an easy-to-compute proxy of adversarial robustness for an input.
Our experiments show that the proposed method consistently exhibits improved certified robustness upon state-of-the-art training methods.
arXiv Detail & Related papers (2022-12-18T03:57:12Z) - Robustness of Unsupervised Representation Learning without Labels [92.90480374344777]
We propose a family of unsupervised robustness measures, which are model- and task-agnostic and label-free.
We validate our results against a linear probe and show that, for MOCOv2, adversarial training results in 3 times higher certified accuracy.
arXiv Detail & Related papers (2022-10-08T18:03:28Z) - How many perturbations break this model? Evaluating robustness beyond
adversarial accuracy [28.934863462633636]
We introduce adversarial sparsity, which quantifies how difficult it is to find a successful perturbation given both an input point and a constraint on the direction of the perturbation.
We show that sparsity provides valuable insight into neural networks in multiple ways.
arXiv Detail & Related papers (2022-07-08T21:25:17Z) - Improving the Adversarial Robustness of NLP Models by Information
Bottleneck [112.44039792098579]
Non-robust features can be easily manipulated by adversaries to fool NLP models.
In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory.
We show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy.
arXiv Detail & Related papers (2022-06-11T12:12:20Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.