Related papers: GradAug: A New Regularization Method for Deep Neural Networks

GradAug: A New Regularization Method for Deep Neural Networks

URL: http://arxiv.org/abs/2006.07989v2
Date: Mon, 12 Oct 2020 18:20:51 GMT
Title: GradAug: A New Regularization Method for Deep Neural Networks
Authors: Taojiannan Yang, Sijie Zhu, Chen Chen
Abstract summary: We propose a new regularization method to alleviate over-fitting in deep neural networks. The proposed method introduces self-guided disturbances to the raw gradients of the network. We demonstrate that GradAug can help the network learn well-generalized and more diverse representations.
Score: 19.239311087570318
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a new regularization method to alleviate over-fitting in deep neural networks. The key idea is utilizing randomly transformed training samples to regularize a set of sub-networks, which are originated by sampling the width of the original network, in the training process. As such, the proposed method introduces self-guided disturbances to the raw gradients of the network and therefore is termed as Gradient Augmentation (GradAug). We demonstrate that GradAug can help the network learn well-generalized and more diverse representations. Moreover, it is easy to implement and can be applied to various structures and applications. GradAug improves ResNet-50 to 78.79% on ImageNet classification, which is a new state-of-the-art accuracy. By combining with CutMix, it further boosts the performance to 79.67%, which outperforms an ensemble of advanced training tricks. The generalization ability is evaluated on COCO object detection and instance segmentation where GradAug significantly surpasses other state-of-the-art methods. GradAug is also robust to image distortions and FGSM adversarial attacks and is highly effective in low data regimes. Code is available at https://github.com/taoyang1122/GradAug

Related papers

Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable. In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols. Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z)
Visual Explanations from Deep Networks via Riemann-Stieltjes Integrated Gradient-based Localization [0.24596929878045565]
We introduce a new technique to produce visual explanations for the predictions of a CNN. Our method can be applied to any layer of the network, and like Integrated Gradients it is not affected by the problem of vanishing gradients. Compared to Grad-CAM, heatmaps produced by our algorithm are better focused in the areas of interest, and their numerical computation is more stable.
arXiv Detail & Related papers (2022-05-22T18:30:38Z)
DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation [99.88539409432916]
We study the unsupervised domain adaptation (UDA) process. We propose a novel UDA method, DAFormer, based on the benchmark results. DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA->Cityscapes and 5.4 mIoU for Synthia->Cityscapes.
arXiv Detail & Related papers (2021-11-29T19:00:46Z)
Neural Network Pruning Through Constrained Reinforcement Learning [3.2880869992413246]
We propose a general methodology for pruning neural networks. Our proposed methodology can prune neural networks to respect pre-defined computational budgets. We prove the effectiveness of our approach via comparison with state-of-the-art methods on standard image classification datasets.
arXiv Detail & Related papers (2021-10-16T11:57:38Z)
TSG: Target-Selective Gradient Backprop for Probing CNN Visual Saliency [72.9106103283475]
We study the visual saliency, a.k.a. visual explanation, to interpret convolutional neural networks. Inspired by those observations, we propose a novel visual saliency framework, termed Target-Selective Gradient (TSG) backprop. The proposed TSG consists of two components, namely, TSG-Conv and TSG-FC, which rectify the gradients for convolutional layers and fully-connected layers, respectively.
arXiv Detail & Related papers (2021-10-11T12:00:20Z)
Training Graph Neural Networks by Graphon Estimation [2.5997274006052544]
We propose to train a graph neural network via resampling from a graphon estimate obtained from the underlying network data. We show that our approach is competitive with and in many cases outperform the other over-smoothing reducing GNN training methods.
arXiv Detail & Related papers (2021-09-04T19:21:48Z)
LaplaceNet: A Hybrid Energy-Neural Model for Deep Semi-Supervised Classification [0.0]
Recent developments in deep semi-supervised classification have reached unprecedented performance. We propose a new framework, LaplaceNet, for deep semi-supervised classification that has a greatly reduced model complexity. Our model outperforms state-of-the-art methods for deep semi-supervised classification, over several benchmark datasets.
arXiv Detail & Related papers (2021-06-08T17:09:28Z)
Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths. Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z)
Active Deep Densely Connected Convolutional Network for Hyperspectral Image Classification [6.850575514129793]
It is still very challenging to use only a few labeled samples to train deep learning models to reach a high classification accuracy. An active deep-learning framework trained by an end-to-end manner is, therefore, proposed by this paper in order to minimize the hyperspectral image classification costs.
arXiv Detail & Related papers (2020-09-01T09:53:38Z)
Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs. Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)
Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications. In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training. Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.