Improving Robustness with Adaptive Weight Decay
- URL: http://arxiv.org/abs/2210.00094v2
- Date: Sat, 2 Dec 2023 01:27:27 GMT
- Title: Improving Robustness with Adaptive Weight Decay
- Authors: Amin Ghiasi, Ali Shafahi, Reza Ardekani
- Abstract summary: We propose adaptive weight decay, which automatically tunes the hyper- parameter iteration for weight decay during each training.
We show that this simple modification can result in large improvements in robustness.
This method has other desirable properties, such as less sensitivity to learning rate, and smaller weight norms.
- Score: 8.096469295357737
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose adaptive weight decay, which automatically tunes the
hyper-parameter for weight decay during each training iteration. For
classification problems, we propose changing the value of the weight decay
hyper-parameter on the fly based on the strength of updates from the
classification loss (i.e., gradient of cross-entropy), and the regularization
loss (i.e., $\ell_2$-norm of the weights). We show that this simple
modification can result in large improvements in adversarial robustness -- an
area which suffers from robust overfitting -- without requiring extra data
across various datasets and architecture choices. For example, our
reformulation results in $20\%$ relative robustness improvement for CIFAR-100,
and $10\%$ relative robustness improvement on CIFAR-10 comparing to the best
tuned hyper-parameters of traditional weight decay resulting in models that
have comparable performance to SOTA robustness methods. In addition, this
method has other desirable properties, such as less sensitivity to learning
rate, and smaller weight norms, which the latter contributes to robustness to
overfitting to label noise, and pruning.
Related papers
- NEAT: Nonlinear Parameter-efficient Adaptation of Pre-trained Models [26.808251361020066]
Fine-tuning pre-trained models is resource-intensive and laborious.
One widely adopted PEFT technique, Low-Rank Adaptation (LoRA), freezes the pre-trained model weights.
NEAT introduces a lightweight neural network that takes pre-trained weights as input and learns a nonlinear transformation to approximate cumulative weight updates.
arXiv Detail & Related papers (2024-10-02T17:29:23Z) - Certified PEFTSmoothing: Parameter-Efficient Fine-Tuning with Randomized Smoothing [6.86204821852287]
Randomized smoothing is the primary certified robustness method for accessing the robustness of deep learning models to adversarial perturbations in the l2-norm.
A notable constraint limiting widespread adoption is the necessity to retrain base models entirely from scratch to attain a robust version.
This is because the base model fails to learn the noise-augmented data distribution to give an accurate vote.
Inspired by recent large model training procedures, we explore an alternative way named PEFTSmoothing to adapt the base model to learn the noise-augmented data.
arXiv Detail & Related papers (2024-04-08T09:38:22Z) - Optimizing for ROC Curves on Class-Imbalanced Data by Training over a Family of Loss Functions [3.06506506650274]
Training reliable classifiers under severe class imbalance is a challenging problem in computer vision.
Recent work has proposed techniques that mitigate the effects of training under imbalance by modifying the loss functions or optimization methods.
We propose training over a family of loss functions, instead of a single loss function.
arXiv Detail & Related papers (2024-02-08T04:31:21Z) - FedNAR: Federated Optimization with Normalized Annealing Regularization [54.42032094044368]
We explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms.
We develop Federated optimization with Normalized Annealing Regularization (FedNAR), a plug-in that can be seamlessly integrated into any existing FL algorithms.
arXiv Detail & Related papers (2023-10-04T21:11:40Z) - Improving Generalization of Adversarial Training via Robust Critical
Fine-Tuning [19.91117174405902]
Deep neural networks are susceptible to adversarial examples, posing a significant security risk in critical applications.
This paper proposes Robustness Critical FineTuning (RiFT), a novel approach to enhance generalization without compromising adversarial robustness.
arXiv Detail & Related papers (2023-08-01T09:02:34Z) - Improve Noise Tolerance of Robust Loss via Noise-Awareness [60.34670515595074]
We propose a meta-learning method which is capable of adaptively learning a hyper parameter prediction function, called Noise-Aware-Robust-Loss-Adjuster (NARL-Adjuster for brevity)
Four SOTA robust loss functions are attempted to be integrated with our algorithm, and comprehensive experiments substantiate the general availability and effectiveness of the proposed method in both its noise tolerance and performance.
arXiv Detail & Related papers (2023-01-18T04:54:58Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - FixNorm: Dissecting Weight Decay for Training Deep Neural Networks [7.820667552233989]
We propose a new training method called FixNorm, which discards weight decay and directly controls the two mechanisms.
On ImageNet classification task, training EfficientNet-B0 with FixNorm achieves 77.7%, which outperforms the original baseline by a clear margin.
arXiv Detail & Related papers (2021-03-29T05:41:56Z) - Dynamic R-CNN: Towards High Quality Object Detection via Dynamic
Training [70.2914594796002]
We propose Dynamic R-CNN to adjust the label assignment criteria and the shape of regression loss function.
Our method improves upon ResNet-50-FPN baseline with 1.9% AP and 5.5% AP$_90$ on the MS dataset with no extra overhead.
arXiv Detail & Related papers (2020-04-13T15:20:25Z) - Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features.
We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.