GradAug: A New Regularization Method for Deep Neural Networks
- URL: http://arxiv.org/abs/2006.07989v2
- Date: Mon, 12 Oct 2020 18:20:51 GMT
- Title: GradAug: A New Regularization Method for Deep Neural Networks
- Authors: Taojiannan Yang, Sijie Zhu, Chen Chen
- Abstract summary: We propose a new regularization method to alleviate over-fitting in deep neural networks.
The proposed method introduces self-guided disturbances to the raw gradients of the network.
We demonstrate that GradAug can help the network learn well-generalized and more diverse representations.
- Score: 19.239311087570318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new regularization method to alleviate over-fitting in deep
neural networks. The key idea is utilizing randomly transformed training
samples to regularize a set of sub-networks, which are originated by sampling
the width of the original network, in the training process. As such, the
proposed method introduces self-guided disturbances to the raw gradients of the
network and therefore is termed as Gradient Augmentation (GradAug). We
demonstrate that GradAug can help the network learn well-generalized and more
diverse representations. Moreover, it is easy to implement and can be applied
to various structures and applications. GradAug improves ResNet-50 to 78.79% on
ImageNet classification, which is a new state-of-the-art accuracy. By combining
with CutMix, it further boosts the performance to 79.67%, which outperforms an
ensemble of advanced training tricks. The generalization ability is evaluated
on COCO object detection and instance segmentation where GradAug significantly
surpasses other state-of-the-art methods. GradAug is also robust to image
distortions and FGSM adversarial attacks and is highly effective in low data
regimes. Code is available at https://github.com/taoyang1122/GradAug
Related papers
- Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable.
In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols.
Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z) - Visual Explanations from Deep Networks via Riemann-Stieltjes Integrated
Gradient-based Localization [0.24596929878045565]
We introduce a new technique to produce visual explanations for the predictions of a CNN.
Our method can be applied to any layer of the network, and like Integrated Gradients it is not affected by the problem of vanishing gradients.
Compared to Grad-CAM, heatmaps produced by our algorithm are better focused in the areas of interest, and their numerical computation is more stable.
arXiv Detail & Related papers (2022-05-22T18:30:38Z) - DAFormer: Improving Network Architectures and Training Strategies for
Domain-Adaptive Semantic Segmentation [99.88539409432916]
We study the unsupervised domain adaptation (UDA) process.
We propose a novel UDA method, DAFormer, based on the benchmark results.
DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA->Cityscapes and 5.4 mIoU for Synthia->Cityscapes.
arXiv Detail & Related papers (2021-11-29T19:00:46Z) - Neural Network Pruning Through Constrained Reinforcement Learning [3.2880869992413246]
We propose a general methodology for pruning neural networks.
Our proposed methodology can prune neural networks to respect pre-defined computational budgets.
We prove the effectiveness of our approach via comparison with state-of-the-art methods on standard image classification datasets.
arXiv Detail & Related papers (2021-10-16T11:57:38Z) - TSG: Target-Selective Gradient Backprop for Probing CNN Visual Saliency [72.9106103283475]
We study the visual saliency, a.k.a. visual explanation, to interpret convolutional neural networks.
Inspired by those observations, we propose a novel visual saliency framework, termed Target-Selective Gradient (TSG) backprop.
The proposed TSG consists of two components, namely, TSG-Conv and TSG-FC, which rectify the gradients for convolutional layers and fully-connected layers, respectively.
arXiv Detail & Related papers (2021-10-11T12:00:20Z) - Training Graph Neural Networks by Graphon Estimation [2.5997274006052544]
We propose to train a graph neural network via resampling from a graphon estimate obtained from the underlying network data.
We show that our approach is competitive with and in many cases outperform the other over-smoothing reducing GNN training methods.
arXiv Detail & Related papers (2021-09-04T19:21:48Z) - LaplaceNet: A Hybrid Energy-Neural Model for Deep Semi-Supervised
Classification [0.0]
Recent developments in deep semi-supervised classification have reached unprecedented performance.
We propose a new framework, LaplaceNet, for deep semi-supervised classification that has a greatly reduced model complexity.
Our model outperforms state-of-the-art methods for deep semi-supervised classification, over several benchmark datasets.
arXiv Detail & Related papers (2021-06-08T17:09:28Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - Active Deep Densely Connected Convolutional Network for Hyperspectral
Image Classification [6.850575514129793]
It is still very challenging to use only a few labeled samples to train deep learning models to reach a high classification accuracy.
An active deep-learning framework trained by an end-to-end manner is, therefore, proposed by this paper in order to minimize the hyperspectral image classification costs.
arXiv Detail & Related papers (2020-09-01T09:53:38Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.