Distilling Robust and Non-Robust Features in Adversarial Examples by
Information Bottleneck
- URL: http://arxiv.org/abs/2204.02735v1
- Date: Wed, 6 Apr 2022 11:22:46 GMT
- Title: Distilling Robust and Non-Robust Features in Adversarial Examples by
Information Bottleneck
- Authors: Junho Kim, Byung-Kwan Lee, Yong Man Ro
- Abstract summary: We propose a way of explicitly distilling feature representation into the robust and non-robust features, using Information Bottleneck.
We demonstrate that the distilled features are highly correlated with adversarial prediction, and they have human-perceptible semantic information by themselves.
We present an attack mechanism intensifying the gradient of non-robust features that is directly related to the model prediction, and validate its effectiveness of breaking model robustness.
- Score: 33.18197518590706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial examples, generated by carefully crafted perturbation, have
attracted considerable attention in research fields. Recent works have argued
that the existence of the robust and non-robust features is a primary cause of
the adversarial examples, and investigated their internal interactions in the
feature space. In this paper, we propose a way of explicitly distilling feature
representation into the robust and non-robust features, using Information
Bottleneck. Specifically, we inject noise variation to each feature unit and
evaluate the information flow in the feature representation to dichotomize
feature units either robust or non-robust, based on the noise variation
magnitude. Through comprehensive experiments, we demonstrate that the distilled
features are highly correlated with adversarial prediction, and they have
human-perceptible semantic information by themselves. Furthermore, we present
an attack mechanism intensifying the gradient of non-robust features that is
directly related to the model prediction, and validate its effectiveness of
breaking model robustness.
Related papers
- Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density [93.32594873253534]
Trustworthy machine learning requires meticulous regulation of model reliance on non-robust features.
We propose a framework to delineate and regulate such features by attributing model predictions to the input.
arXiv Detail & Related papers (2024-07-05T09:16:56Z) - Mitigating Feature Gap for Adversarial Robustness by Feature
Disentanglement [61.048842737581865]
Adversarial fine-tuning methods aim to enhance adversarial robustness through fine-tuning the naturally pre-trained model in an adversarial training manner.
We propose a disentanglement-based approach to explicitly model and remove the latent features that cause the feature gap.
Empirical evaluations on three benchmark datasets demonstrate that our approach surpasses existing adversarial fine-tuning methods and adversarial training baselines.
arXiv Detail & Related papers (2024-01-26T08:38:57Z) - The Risk of Federated Learning to Skew Fine-Tuning Features and
Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model.
We introduce three robustness indicators and conduct experiments across diverse robust datasets.
Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z) - Unveiling the Potential of Probabilistic Embeddings in Self-Supervised
Learning [4.124934010794795]
Self-supervised learning has played a pivotal role in advancing machine learning by allowing models to acquire meaningful representations from unlabeled data.
We investigate the impact of probabilistic modeling on the information bottleneck, shedding light on a trade-off between compression and preservation of information in both representation and loss space.
Our findings suggest that introducing an additional bottleneck in the loss space can significantly enhance the ability to detect out-of-distribution examples.
arXiv Detail & Related papers (2023-10-27T12:01:16Z) - Understanding Robust Overfitting from the Feature Generalization Perspective [61.770805867606796]
Adversarial training (AT) constructs robust neural networks by incorporating adversarial perturbations into natural data.
It is plagued by the issue of robust overfitting (RO), which severely damages the model's robustness.
In this paper, we investigate RO from a novel feature generalization perspective.
arXiv Detail & Related papers (2023-10-01T07:57:03Z) - Exploring Robust Features for Improving Adversarial Robustness [11.935612873688122]
We explore the robust features which are not affected by the adversarial perturbations to improve the model's adversarial robustness.
Specifically, we propose a feature disentanglement model to segregate the robust features from non-robust features and domain specific features.
The trained domain discriminator is able to identify the domain specific features from the clean images and adversarial examples almost perfectly.
arXiv Detail & Related papers (2023-09-09T00:30:04Z) - On the Robustness of Removal-Based Feature Attributions [17.679374058425346]
We theoretically characterize the properties of robustness of removal-based feature attributions.
Specifically, we provide a unified analysis of such methods and derive upper bounds for the difference between intact and perturbed attributions.
Our results on synthetic and real-world data validate our theoretical results and demonstrate their practical implications.
arXiv Detail & Related papers (2023-06-12T23:33:13Z) - Feature Separation and Recalibration for Adversarial Robustness [18.975320671203132]
We propose a novel, easy-to- verify approach named Feature Separation and Recalibration.
It recalibrates the malicious, non-robust activations for more robust feature maps through Separation and Recalibration.
It improves the robustness of existing adversarial training methods by up to 8.57% with small computational overhead.
arXiv Detail & Related papers (2023-03-24T07:43:57Z) - Improving the Adversarial Robustness of NLP Models by Information
Bottleneck [112.44039792098579]
Non-robust features can be easily manipulated by adversaries to fool NLP models.
In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory.
We show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy.
arXiv Detail & Related papers (2022-06-11T12:12:20Z) - Removing Spurious Features can Hurt Accuracy and Affect Groups
Disproportionately [83.68135652247496]
A natural remedy is to remove spurious features from the model.
We show that removal of spurious features can decrease accuracy due to inductive biases.
We also show that robust self-training can remove spurious features without affecting the overall accuracy.
arXiv Detail & Related papers (2020-12-07T23:08:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.