Improving Adversarial Robustness via Channel-wise Activation Suppressing
- URL: http://arxiv.org/abs/2103.08307v1
- Date: Thu, 11 Mar 2021 03:44:16 GMT
- Title: Improving Adversarial Robustness via Channel-wise Activation Suppressing
- Authors: Yang Bai, Yuyuan Zeng, Yong Jiang, Shu-Tao Xia, Xingjun Ma, Yisen Wang
- Abstract summary: The study of adversarial examples and their activation has attracted significant attention for secure and robust learning with deep neural networks (DNNs)
In this paper, we highlight two new characteristics of adversarial examples from the channel-wise activation perspective.
We show that CAS can train a model that inherently suppresses adversarial activation, and can be easily applied to existing defense methods to further improve their robustness.
- Score: 65.72430571867149
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The study of adversarial examples and their activation has attracted
significant attention for secure and robust learning with deep neural networks
(DNNs). Different from existing works, in this paper, we highlight two new
characteristics of adversarial examples from the channel-wise activation
perspective: 1) the activation magnitudes of adversarial examples are higher
than that of natural examples; and 2) the channels are activated more uniformly
by adversarial examples than natural examples. We find that the
state-of-the-art defense adversarial training has addressed the first issue of
high activation magnitudes via training on adversarial examples, while the
second issue of uniform activation remains. This motivates us to suppress
redundant activation from being activated by adversarial perturbations via a
Channel-wise Activation Suppressing (CAS) strategy. We show that CAS can train
a model that inherently suppresses adversarial activation, and can be easily
applied to existing defense methods to further improve their robustness. Our
work provides a simple but generic training strategy for robustifying the
intermediate layer activation of DNNs.
Related papers
- Robust Active Learning (RoAL): Countering Dynamic Adversaries in Active Learning with Elastic Weight Consolidation [16.85038790429607]
This paper introduces Robust Active Learning (RoAL), a novel approach designed to address the challenge of developing robust active learning frameworks.
We propose a new dynamic adversarial attack that poses significant threats to active learning frameworks.
Second, we introduce a novel method that combines EWC with active learning to mitigate catastrophic forgetting caused by dynamic adversarial attacks.
arXiv Detail & Related papers (2024-08-14T08:22:13Z) - Pre-trained Trojan Attacks for Visual Recognition [106.13792185398863]
Pre-trained vision models (PVMs) have become a dominant component due to their exceptional performance when fine-tuned for downstream tasks.
We propose the Pre-trained Trojan attack, which embeds backdoors into a PVM, enabling attacks across various downstream vision tasks.
We highlight the challenges posed by cross-task activation and shortcut connections in successful backdoor attacks.
arXiv Detail & Related papers (2023-12-23T05:51:40Z) - F$^2$AT: Feature-Focusing Adversarial Training via Disentanglement of
Natural and Perturbed Patterns [74.03108122774098]
Deep neural networks (DNNs) are vulnerable to adversarial examples crafted by well-designed perturbations.
This could lead to disastrous results on critical applications such as self-driving cars, surveillance security, and medical diagnosis.
We propose a Feature-Focusing Adversarial Training (F$2$AT) which enforces the model to focus on the core features from natural patterns.
arXiv Detail & Related papers (2023-10-23T04:31:42Z) - Boosting Adversarial Transferability via Fusing Logits of Top-1
Decomposed Feature [36.78292952798531]
We propose a Singular Value Decomposition (SVD)-based feature-level attack method.
Our approach is inspired by the discovery that eigenvectors associated with the larger singular values from the middle layer features exhibit superior generalization and attention properties.
arXiv Detail & Related papers (2023-05-02T12:27:44Z) - Enhancing Adversarial Training with Feature Separability [52.39305978984573]
We introduce a new concept of adversarial training graph (ATG) with which the proposed adversarial training with feature separability (ATFS) enables to boost the intra-class feature similarity and increase inter-class feature variance.
Through comprehensive experiments, we demonstrate that the proposed ATFS framework significantly improves both clean and robust performance.
arXiv Detail & Related papers (2022-05-02T04:04:23Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - Removing Adversarial Noise in Class Activation Feature Space [160.78488162713498]
We propose to remove adversarial noise by implementing a self-supervised adversarial training mechanism in a class activation feature space.
We train a denoising model to minimize the distances between the adversarial examples and the natural examples in the class activation feature space.
Empirical evaluations demonstrate that our method could significantly enhance adversarial robustness in comparison to previous state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-19T10:42:24Z) - LiBRe: A Practical Bayesian Approach to Adversarial Detection [36.541671795530625]
LiBRe can endow a variety of pre-trained task-dependent DNNs with the ability of defending heterogeneous adversarial attacks at a low cost.
We build the few-layer deep ensemble variational and adopt the pre-training & fine-tuning workflow to boost the effectiveness and efficiency of LiBRe.
arXiv Detail & Related papers (2021-03-27T07:48:58Z) - Low Curvature Activations Reduce Overfitting in Adversarial Training [38.53171807664161]
Adversarial training is one of the most effective defenses against adversarial attacks.
We show that the observed generalization gap is closely related to the choice of the activation function.
arXiv Detail & Related papers (2021-02-15T21:53:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.