Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With
Trainable Masked Layers
- URL: http://arxiv.org/abs/2005.06870v1
- Date: Thu, 14 May 2020 11:05:21 GMT
- Title: Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With
Trainable Masked Layers
- Authors: Junjie Liu, Zhe Xu, Runbin Shi, Ray C. C. Cheung, Hayden K.H. So
- Abstract summary: We present a novel network pruning algorithm called Dynamic Sparse Training that can jointly find the optimal network parameters and sparse network structure.
We demonstrate that our dynamic sparse training algorithm can easily train very sparse neural network models with little performance loss.
- Score: 18.22501196339569
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel network pruning algorithm called Dynamic Sparse Training
that can jointly find the optimal network parameters and sparse network
structure in a unified optimization process with trainable pruning thresholds.
These thresholds can have fine-grained layer-wise adjustments dynamically via
backpropagation. We demonstrate that our dynamic sparse training algorithm can
easily train very sparse neural network models with little performance loss
using the same number of training epochs as dense models. Dynamic Sparse
Training achieves the state of the art performance compared with other sparse
training algorithms on various network architectures. Additionally, we have
several surprising observations that provide strong evidence for the
effectiveness and efficiency of our algorithm. These observations reveal the
underlying problems of traditional three-stage pruning algorithms and present
the potential guidance provided by our algorithm to the design of more compact
network architectures.
Related papers
- Dynamic Neural Network for Multi-Task Learning Searching across Diverse
Network Topologies [14.574399133024594]
We present a new MTL framework that searches for optimized structures for multiple tasks with diverse graph topologies.
We design a restricted DAG-based central network with read-in/read-out layers to build topologically diverse task-adaptive structures.
arXiv Detail & Related papers (2023-03-13T05:01:50Z) - Tricks and Plugins to GBM on Images and Sequences [18.939336393665553]
We propose a new algorithm for boosting Deep Convolutional Neural Networks (BoostCNN) to combine the merits of dynamic feature selection and BoostCNN.
We also propose a set of algorithms to incorporate boosting weights into a deep learning architecture based on a least squares objective function.
Experiments show that the proposed methods outperform benchmarks on several fine-grained classification tasks.
arXiv Detail & Related papers (2022-03-01T21:59:00Z) - Dynamic Analysis of Nonlinear Civil Engineering Structures using
Artificial Neural Network with Adaptive Training [2.1202971527014287]
In this study, artificial neural networks are developed with adaptive training algorithms.
The networks can successfully predict the time-history response of the shear frame and the rock structure to real ground motion records.
arXiv Detail & Related papers (2021-11-21T21:14:48Z) - Analytically Tractable Inference in Deep Neural Networks [0.0]
Tractable Approximate Inference (TAGI) algorithm was shown to be a viable and scalable alternative to backpropagation for shallow fully-connected neural networks.
We are demonstrating how TAGI matches or exceeds the performance of backpropagation, for training classic deep neural network architectures.
arXiv Detail & Related papers (2021-03-09T14:51:34Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - Advances in the training, pruning and enforcement of shape constraints
of Morphological Neural Networks using Tropical Algebra [40.327435646554115]
We study neural networks based on the morphological operators of dilation and erosion.
Our contributions include the training of morphological networks via Difference-of-Convex programming methods and extend a binary morphological to multiclass tasks.
arXiv Detail & Related papers (2020-11-15T22:44:25Z) - Stochastic Markov Gradient Descent and Training Low-Bit Neural Networks [77.34726150561087]
We introduce Gradient Markov Descent (SMGD), a discrete optimization method applicable to training quantized neural networks.
We provide theoretical guarantees of algorithm performance as well as encouraging numerical results.
arXiv Detail & Related papers (2020-08-25T15:48:15Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z) - Subset Sampling For Progressive Neural Network Learning [106.12874293597754]
Progressive Neural Network Learning is a class of algorithms that incrementally construct the network's topology and optimize its parameters based on the training data.
We propose to speed up this process by exploiting subsets of training data at each incremental training step.
Experimental results in object, scene and face recognition problems demonstrate that the proposed approach speeds up the optimization procedure considerably.
arXiv Detail & Related papers (2020-02-17T18:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.