Improving Adversarial Robustness via Mutual Information Estimation
- URL: http://arxiv.org/abs/2207.12203v1
- Date: Mon, 25 Jul 2022 13:45:11 GMT
- Title: Improving Adversarial Robustness via Mutual Information Estimation
- Authors: Dawei Zhou, Nannan Wang, Xinbo Gao, Bo Han, Xiaoyu Wang, Yibing Zhan,
Tongliang Liu
- Abstract summary: Deep neural networks (DNNs) are found to be vulnerable to adversarial noise.
In this paper, we investigate the dependence between outputs of the target model and input adversarial samples from the perspective of information theory.
We propose to enhance the adversarial robustness by maximizing the natural MI and minimizing the adversarial MI during the training process.
- Score: 144.33170440878519
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) are found to be vulnerable to adversarial noise.
They are typically misled by adversarial samples to make wrong predictions. To
alleviate this negative effect, in this paper, we investigate the dependence
between outputs of the target model and input adversarial samples from the
perspective of information theory, and propose an adversarial defense method.
Specifically, we first measure the dependence by estimating the mutual
information (MI) between outputs and the natural patterns of inputs (called
natural MI) and MI between outputs and the adversarial patterns of inputs
(called adversarial MI), respectively. We find that adversarial samples usually
have larger adversarial MI and smaller natural MI compared with those w.r.t.
natural samples. Motivated by this observation, we propose to enhance the
adversarial robustness by maximizing the natural MI and minimizing the
adversarial MI during the training process. In this way, the target model is
expected to pay more attention to the natural pattern that contains objective
semantics. Empirical evaluations demonstrate that our method could effectively
improve the adversarial accuracy against multiple attacks.
Related papers
- Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM)
To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z) - Mitigating Feature Gap for Adversarial Robustness by Feature
Disentanglement [61.048842737581865]
Adversarial fine-tuning methods aim to enhance adversarial robustness through fine-tuning the naturally pre-trained model in an adversarial training manner.
We propose a disentanglement-based approach to explicitly model and remove the latent features that cause the feature gap.
Empirical evaluations on three benchmark datasets demonstrate that our approach surpasses existing adversarial fine-tuning methods and adversarial training baselines.
arXiv Detail & Related papers (2024-01-26T08:38:57Z) - Generating Adversarial Samples in Mini-Batches May Be Detrimental To
Adversarial Robustness [0.0]
We explore the relationship between the mini-batch size used during adversarial sample generation and the strength of the adversarial samples produced.
We formulate loss functions such that adversarial sample strength is not degraded by mini-batch size.
Our findings highlight a potential risk for underestimating the true (practical) strength of adversarial attacks, and a risk of overestimating a model's robustness.
arXiv Detail & Related papers (2023-03-30T21:42:50Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Rethinking Machine Learning Robustness via its Link with the
Out-of-Distribution Problem [16.154434566725012]
We investigate the causes behind machine learning models' susceptibility to adversarial examples.
We propose an OOD generalization method that stands against both adversary-induced and natural distribution shifts.
Our approach consistently improves robustness to OOD adversarial inputs and outperforms state-of-the-art defenses.
arXiv Detail & Related papers (2022-02-18T00:17:23Z) - Understanding the Logit Distributions of Adversarially-Trained Deep
Neural Networks [6.439477789066243]
Adversarial defenses train deep neural networks to be invariant to the input perturbations from adversarial attacks.
Although adversarial training is successful at mitigating adversarial attacks, the behavioral differences between adversarially-trained (AT) models and standard models are still poorly understood.
We identify three logit characteristics essential to learning adversarial robustness.
arXiv Detail & Related papers (2021-08-26T19:09:15Z) - Adversarial Robustness through the Lens of Causality [105.51753064807014]
adversarial vulnerability of deep neural networks has attracted significant attention in machine learning.
We propose to incorporate causality into mitigating adversarial vulnerability.
Our method can be seen as the first attempt to leverage causality for mitigating adversarial vulnerability.
arXiv Detail & Related papers (2021-06-11T06:55:02Z) - Improving White-box Robustness of Pre-processing Defenses via Joint Adversarial Training [106.34722726264522]
A range of adversarial defense techniques have been proposed to mitigate the interference of adversarial noise.
Pre-processing methods may suffer from the robustness degradation effect.
A potential cause of this negative effect is that adversarial training examples are static and independent to the pre-processing model.
We propose a method called Joint Adversarial Training based Pre-processing (JATP) defense.
arXiv Detail & Related papers (2021-06-10T01:45:32Z) - Recent Advances in Understanding Adversarial Robustness of Deep Neural
Networks [15.217367754000913]
It is increasingly important to obtain models with high robustness that are resistant to adversarial examples.
We give preliminary definitions on what adversarial attacks and robustness are.
We study frequently-used benchmarks and mention theoretically-proved bounds for adversarial robustness.
arXiv Detail & Related papers (2020-11-03T07:42:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.