A Closer Look at the Adversarial Robustness of Information Bottleneck
Models
- URL: http://arxiv.org/abs/2107.05712v1
- Date: Mon, 12 Jul 2021 20:05:08 GMT
- Title: A Closer Look at the Adversarial Robustness of Information Bottleneck
Models
- Authors: Iryna Korshunova, David Stutz, Alexander A. Alemi, Olivia Wiles, Sven
Gowal
- Abstract summary: Previous works showed that the robustness of models trained with information bottlenecks can improve upon adversarial training.
Our evaluation under a diverse range of white-box $l_infty$ attacks suggests that information bottlenecks alone are not a strong defense strategy.
- Score: 87.89442166368983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the adversarial robustness of information bottleneck models for
classification. Previous works showed that the robustness of models trained
with information bottlenecks can improve upon adversarial training. Our
evaluation under a diverse range of white-box $l_{\infty}$ attacks suggests
that information bottlenecks alone are not a strong defense strategy, and that
previous results were likely influenced by gradient obfuscation.
Related papers
- Singular Regularization with Information Bottleneck Improves Model's
Adversarial Robustness [30.361227245739745]
Adversarial examples are one of the most severe threats to deep learning models.
We study adversarial information as unstructured noise, which does not have a clear pattern.
We propose a new module to regularize adversarial information and combine information bottleneck theory.
arXiv Detail & Related papers (2023-12-04T09:07:30Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Improving the Adversarial Robustness of NLP Models by Information
Bottleneck [112.44039792098579]
Non-robust features can be easily manipulated by adversaries to fool NLP models.
In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory.
We show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy.
arXiv Detail & Related papers (2022-06-11T12:12:20Z) - Beyond Gradients: Exploiting Adversarial Priors in Model Inversion
Attacks [7.49320945341034]
Collaborative machine learning settings can be susceptible to adversarial interference and attacks.
One class of such attacks is termed model inversion attacks, characterised by the adversary reverse-engineering the model to extract representations.
We propose a novel model inversion framework that builds on the foundations of gradient-based model inversion attacks.
arXiv Detail & Related papers (2022-03-01T14:22:29Z) - Impact of Attention on Adversarial Robustness of Image Classification
Models [0.9176056742068814]
Adrial attacks against deep learning models have gained significant attention.
Recent works have proposed explanations for the existence of adversarial examples and techniques to defend the models against these attacks.
This work aims at a general understanding of the impact of attention on adversarial robustness.
arXiv Detail & Related papers (2021-09-02T13:26:32Z) - Delving into Data: Effectively Substitute Training for Black-box Attack [84.85798059317963]
We propose a novel perspective substitute training that focuses on designing the distribution of data used in the knowledge stealing process.
The combination of these two modules can further boost the consistency of the substitute model and target model, which greatly improves the effectiveness of adversarial attack.
arXiv Detail & Related papers (2021-04-26T07:26:29Z) - Voting based ensemble improves robustness of defensive models [82.70303474487105]
We study whether it is possible to create an ensemble to further improve robustness.
By ensembling several state-of-the-art pre-trained defense models, our method can achieve a 59.8% robust accuracy.
arXiv Detail & Related papers (2020-11-28T00:08:45Z) - Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model.
We propose to exploit additional information from the feature space to craft stronger adversaries.
Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.