Rethinking Uncertainty in Deep Learning: Whether and How it Improves
Robustness
- URL: http://arxiv.org/abs/2011.13538v1
- Date: Fri, 27 Nov 2020 03:22:50 GMT
- Title: Rethinking Uncertainty in Deep Learning: Whether and How it Improves
Robustness
- Authors: Yilun Jin, Lixin Fan, Kam Woh Ng, Ce Ju, Qiang Yang
- Abstract summary: adversarial training (AT) suffers from poor performance both on clean examples and under other types of attacks.
Regularizers that encourage uncertain outputs, such as entropy (EntM) and label smoothing (LS) can maintain accuracy on clean examples and improve performance under weak attacks.
In this paper, we revisit uncertainty promotion regularizers, including EntM and LS, in the field of adversarial learning.
- Score: 20.912492996647888
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Deep neural networks (DNNs) are known to be prone to adversarial attacks, for
which many remedies are proposed. While adversarial training (AT) is regarded
as the most robust defense, it suffers from poor performance both on clean
examples and under other types of attacks, e.g. attacks with larger
perturbations. Meanwhile, regularizers that encourage uncertain outputs, such
as entropy maximization (EntM) and label smoothing (LS) can maintain accuracy
on clean examples and improve performance under weak attacks, yet their ability
to defend against strong attacks is still in doubt. In this paper, we revisit
uncertainty promotion regularizers, including EntM and LS, in the field of
adversarial learning. We show that EntM and LS alone provide robustness only
under small perturbations. Contrarily, we show that uncertainty promotion
regularizers complement AT in a principled manner, consistently improving
performance on both clean examples and under various attacks, especially
attacks with large perturbations. We further analyze how uncertainty promotion
regularizers enhance the performance of AT from the perspective of Jacobian
matrices $\nabla_X f(X;\theta)$, and find out that EntM effectively shrinks the
norm of Jacobian matrices and hence promotes robustness.
Related papers
- Smoothed Embeddings for Robust Language Models [11.97873981355746]
Large language models (LLMs) are vulnerable to jailbreaking attacks that subvert alignment and induce harmful outputs.
We propose the Randomized Embedding Smoothing and Token Aggregation (RESTA) defense, which adds random noise to the embedding vectors and performs aggregation during the generation of each output token.
Our experiments demonstrate that our approach achieves superior robustness versus utility tradeoffs compared to the baseline defenses.
arXiv Detail & Related papers (2025-01-27T20:57:26Z) - Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM)
To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z) - Robust LLM safeguarding via refusal feature adversarial training [15.76605079209956]
Large language models (LLMs) are vulnerable to adversarial attacks that can elicit harmful responses.
We propose Refusal Feature Adrial Training (ReFAT), a novel algorithm that efficiently performs adversarial training.
Experiment results show that ReFAT significantly improves the robustness of three popular LLMs against a wide range of adversarial attacks.
arXiv Detail & Related papers (2024-09-30T08:41:39Z) - Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails.
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses.
C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z) - Post-Training Overfitting Mitigation in DNN Classifiers [31.513866929577336]
We show that post-training MM-based regularization substantially mitigates non-malicious overfitting due to class imbalances and overtraining.
Unlike adversarial training, which provides some resilience against attacks but which harms clean (attack-free) generalization, we demonstrate an approach originating from adversarial learning.
arXiv Detail & Related papers (2023-09-28T20:16:24Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Effective Targeted Attacks for Adversarial Self-Supervised Learning [58.14233572578723]
unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information.
We propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks.
Our method demonstrates significant enhancements in robustness when applied to non-contrastive SSL frameworks, and less but consistent robustness improvements with contrastive SSL frameworks.
arXiv Detail & Related papers (2022-10-19T11:43:39Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.