Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network
Optimization
- URL: http://arxiv.org/abs/2102.01670v1
- Date: Tue, 2 Feb 2021 18:40:26 GMT
- Title: Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network
Optimization
- Authors: Kale-ab Tessera, Sara Hooker, Benjamin Rosman
- Abstract summary: We take a broader view of training sparse networks and consider the role of regularization, optimization and architecture choices on sparse models.
We show that gradient flow in sparse networks can be improved by reconsidering aspects of the architecture design and the training regime.
- Score: 16.85167651136133
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training sparse networks to converge to the same performance as dense neural
architectures has proven to be elusive. Recent work suggests that
initialization is the key. However, while this direction of research has had
some success, focusing on initialization alone appears to be inadequate. In
this paper, we take a broader view of training sparse networks and consider the
role of regularization, optimization and architecture choices on sparse models.
We propose a simple experimental framework, Same Capacity Sparse vs Dense
Comparison (SC-SDC), that allows for fair comparison of sparse and dense
networks. Furthermore, we propose a new measure of gradient flow, Effective
Gradient Flow (EGF), that better correlates to performance in sparse networks.
Using top-line metrics, SC-SDC and EGF, we show that default choices of
optimizers, activation functions and regularizers used for dense networks can
disadvantage sparse networks. Based upon these findings, we show that gradient
flow in sparse networks can be improved by reconsidering aspects of the
architecture design and the training regime. Our work suggests that
initialization is only one piece of the puzzle and taking a wider view of
tailoring optimization to sparse networks yields promising results.
Related papers
- Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - Neural Network Compression by Joint Sparsity Promotion and Redundancy
Reduction [4.9613162734482215]
This paper presents a novel training scheme based on composite constraints that prune redundant filters and minimize their effect on overall network learning via sparsity promotion.
Our tests on several pixel-wise segmentation benchmarks show that the number of neurons and the memory footprint of networks in the test phase are significantly reduced without affecting performance.
arXiv Detail & Related papers (2022-10-14T01:34:49Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.