Automatic Sparse Connectivity Learning for Neural Networks
- URL: http://arxiv.org/abs/2201.05020v1
- Date: Thu, 13 Jan 2022 15:12:48 GMT
- Title: Automatic Sparse Connectivity Learning for Neural Networks
- Authors: Zhimin Tang, Linkai Luo, Bike Xie, Yiyu Zhu, Rujie Zhao, Lvqing Bi,
Chao Lu
- Abstract summary: Well-designed sparse neural networks have the potential to significantly reduce FLOPs and computational resources.
In this work, we propose a new automatic pruning method - Sparse Connectivity Learning.
Deep learning models trained by SCL outperform the SOTA human-designed and automatic pruning methods in sparsity, accuracy, and FLOPs reduction.
- Score: 4.875787559251317
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Since sparse neural networks usually contain many zero weights, these
unnecessary network connections can potentially be eliminated without degrading
network performance. Therefore, well-designed sparse neural networks have the
potential to significantly reduce FLOPs and computational resources. In this
work, we propose a new automatic pruning method - Sparse Connectivity Learning
(SCL). Specifically, a weight is re-parameterized as an element-wise
multiplication of a trainable weight variable and a binary mask. Thus, network
connectivity is fully described by the binary mask, which is modulated by a
unit step function. We theoretically prove the fundamental principle of using a
straight-through estimator (STE) for network pruning. This principle is that
the proxy gradients of STE should be positive, ensuring that mask variables
converge at their minima. After finding Leaky ReLU, Softplus, and Identity STEs
can satisfy this principle, we propose to adopt Identity STE in SCL for
discrete mask relaxation. We find that mask gradients of different features are
very unbalanced, hence, we propose to normalize mask gradients of each feature
to optimize mask variable training. In order to automatically train sparse
masks, we include the total number of network connections as a regularization
term in our objective function. As SCL does not require pruning criteria or
hyper-parameters defined by designers for network layers, the network is
explored in a larger hypothesis space to achieve optimized sparse connectivity
for the best performance. SCL overcomes the limitations of existing automatic
pruning methods. Experimental results demonstrate that SCL can automatically
learn and select important network connections for various baseline network
structures. Deep learning models trained by SCL outperform the SOTA
human-designed and automatic pruning methods in sparsity, accuracy, and FLOPs
reduction.
Related papers
- Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery [0.0]
We propose a novel algorithm for combined unit/filter and layer pruning of deep neural networks that functions during training and without requiring a pre-trained network to apply.
Our algorithm optimally trades-off learning accuracy and pruning levels while balancing layer vs. unit/filter pruning and computational vs. parameter complexity using only three user-defined parameters.
arXiv Detail & Related papers (2024-11-14T02:00:22Z) - Concurrent Training and Layer Pruning of Deep Neural Networks [0.0]
We propose an algorithm capable of identifying and eliminating irrelevant layers of a neural network during the early stages of training.
We employ a structure using residual connections around nonlinear network sections that allow the flow of information through the network once a nonlinear section is pruned.
arXiv Detail & Related papers (2024-06-06T23:19:57Z) - Communication-Efficient Federated Learning via Regularized Sparse Random
Networks [21.491346993533572]
This work presents a new method for enhancing communication efficiency in Federated Learning.
In this setting, a binary mask is optimized instead of the model weights, which are kept fixed.
S sparse binary masks are exchanged rather than the floating point weights in traditional federated learning.
arXiv Detail & Related papers (2023-09-19T14:05:12Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Learning a Consensus Sub-Network with Polarization Regularization and
One Pass Training [3.2214522506924093]
Pruning schemes create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph.
We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks.
Our results on CIFAR-10 and CIFAR-100 suggest that our scheme can remove 50% of connections in deep networks with less than 1% reduction in classification accuracy.
arXiv Detail & Related papers (2023-02-17T09:37:17Z) - Parameter-Efficient Masking Networks [61.43995077575439]
Advanced network designs often contain a large number of repetitive structures (e.g., Transformer)
In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning masks.
It leads to a new paradigm for model compression to diminish the model size.
arXiv Detail & Related papers (2022-10-13T03:39:03Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - DHP: Differentiable Meta Pruning via HyperNetworks [158.69345612783198]
This paper introduces a differentiable pruning method via hypernetworks for automatic network pruning.
Latent vectors control the output channels of the convolutional layers in the backbone network and act as a handle for the pruning of the layers.
Experiments are conducted on various networks for image classification, single image super-resolution, and denoising.
arXiv Detail & Related papers (2020-03-30T17:59:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.