Low Curvature Activations Reduce Overfitting in Adversarial Training
- URL: http://arxiv.org/abs/2102.07861v1
- Date: Mon, 15 Feb 2021 21:53:27 GMT
- Title: Low Curvature Activations Reduce Overfitting in Adversarial Training
- Authors: Vasu Singla, Sahil Singla, David Jacobs, Soheil Feizi
- Abstract summary: Adversarial training is one of the most effective defenses against adversarial attacks.
We show that the observed generalization gap is closely related to the choice of the activation function.
- Score: 38.53171807664161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial training is one of the most effective defenses against
adversarial attacks. Previous works suggest that overfitting is a dominant
phenomenon in adversarial training leading to a large generalization gap
between test and train accuracy in neural networks. In this work, we show that
the observed generalization gap is closely related to the choice of the
activation function. In particular, we show that using activation functions
with low (exact or approximate) curvature values has a regularization effect
that significantly reduces both the standard and robust generalization gaps in
adversarial training. We observe this effect for both differentiable/smooth
activations such as Swish as well as non-differentiable/non-smooth activations
such as LeakyReLU. In the latter case, the approximate curvature of the
activation is low. Finally, we show that for activation functions with low
curvature, the double descent phenomenon for adversarially trained models does
not occur.
Related papers
- Linear Oscillation: A Novel Activation Function for Vision Transformer [0.0]
We present the Linear Oscillation (LoC) activation function, defined as $f(x) = x times sin(alpha x + beta)$.
Distinct from conventional activation functions which primarily introduce non-linearity, LoC seamlessly blends linear trajectories with oscillatory deviations.
Our empirical studies reveal that, when integrated into diverse neural architectures, the LoC activation function consistently outperforms established counterparts like ReLU and Sigmoid.
arXiv Detail & Related papers (2023-08-25T20:59:51Z) - Data-aware customization of activation functions reduces neural network
error [0.35172332086962865]
We show that data-aware customization of activation functions can result in striking reductions in neural network error.
A simple substitution with the seagull'' activation function in an already-refined neural network can lead to an order-of-magnitude reduction in error.
arXiv Detail & Related papers (2023-01-16T23:38:37Z) - Stochastic Adaptive Activation Function [1.9199289015460212]
This study proposes a simple yet effective activation function that facilitates different thresholds and adaptive activations according to the positions of units and the contexts of inputs.
Experimental analysis demonstrates that our activation function can provide the benefits of more accurate prediction and earlier convergence in many deep learning applications.
arXiv Detail & Related papers (2022-10-21T01:57:25Z) - Enhancing Adversarial Training with Feature Separability [52.39305978984573]
We introduce a new concept of adversarial training graph (ATG) with which the proposed adversarial training with feature separability (ATFS) enables to boost the intra-class feature similarity and increase inter-class feature variance.
Through comprehensive experiments, we demonstrate that the proposed ATFS framework significantly improves both clean and robust performance.
arXiv Detail & Related papers (2022-05-02T04:04:23Z) - On the Role of Optimization in Double Descent: A Least Squares Study [30.44215064390409]
We show an excess risk bound for the descent gradient solution of the least squares objective.
We find that in case of noiseless regression, double descent is explained solely by optimization-related quantities.
We empirically explore if our predictions hold for neural networks.
arXiv Detail & Related papers (2021-07-27T09:13:11Z) - Activation function design for deep networks: linearity and effective
initialisation [10.108857371774977]
We study how to avoid two problems at initialisation identified in prior works.
We prove that both these problems can be avoided by choosing an activation function possessing a sufficiently large linear region around the origin.
arXiv Detail & Related papers (2021-05-17T11:30:46Z) - Improving Adversarial Robustness via Channel-wise Activation Suppressing [65.72430571867149]
The study of adversarial examples and their activation has attracted significant attention for secure and robust learning with deep neural networks (DNNs)
In this paper, we highlight two new characteristics of adversarial examples from the channel-wise activation perspective.
We show that CAS can train a model that inherently suppresses adversarial activation, and can be easily applied to existing defense methods to further improve their robustness.
arXiv Detail & Related papers (2021-03-11T03:44:16Z) - Feature Purification: How Adversarial Training Performs Robust Deep
Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z) - Overfitting in adversarially robust deep learning [86.11788847990783]
We show that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training.
We also show that effects such as the double descent curve do still occur in adversarially trained models, yet fail to explain the observed overfitting.
arXiv Detail & Related papers (2020-02-26T15:40:50Z) - Over-parameterized Adversarial Training: An Analysis Overcoming the
Curse of Dimensionality [74.0084803220897]
Adversarial training is a popular method to give neural nets robustness against adversarial perturbations.
We show convergence to low robust training loss for emphpolynomial width instead of exponential, under natural assumptions and with the ReLU activation.
arXiv Detail & Related papers (2020-02-16T20:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.