HALO: Learning to Prune Neural Networks with Shrinkage
- URL: http://arxiv.org/abs/2008.10183v3
- Date: Sun, 28 Feb 2021 04:26:09 GMT
- Title: HALO: Learning to Prune Neural Networks with Shrinkage
- Authors: Skyler Seto, Martin T. Wells, Wenyu Zhang
- Abstract summary: Deep neural networks achieve state-of-the-art performance in a variety of tasks by extracting a rich set of features from unstructured data.
Modern techniques for inducing sparsity and reducing model size are (1) network pruning, (2) training with a sparsity inducing penalty, and (3) training a binary mask jointly with the weights of the network.
We present a novel penalty called Hierarchical Adaptive Lasso which learns to adaptively sparsify weights of a given network via trainable parameters.
- Score: 5.283963846188862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks achieve state-of-the-art performance in a variety of
tasks by extracting a rich set of features from unstructured data, however this
performance is closely tied to model size. Modern techniques for inducing
sparsity and reducing model size are (1) network pruning, (2) training with a
sparsity inducing penalty, and (3) training a binary mask jointly with the
weights of the network. We study different sparsity inducing penalties from the
perspective of Bayesian hierarchical models and present a novel penalty called
Hierarchical Adaptive Lasso (HALO) which learns to adaptively sparsify weights
of a given network via trainable parameters. When used to train
over-parametrized networks, our penalty yields small subnetworks with high
accuracy without fine-tuning. Empirically, on image recognition tasks, we find
that HALO is able to learn highly sparse network (only 5% of the parameters)
with significant gains in performance over state-of-the-art magnitude pruning
methods at the same level of sparsity. Code is available at
https://github.com/skyler120/sparsity-halo.
Related papers
- Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z) - Federated Progressive Sparsification (Purge, Merge, Tune)+ [15.08232397899507]
FedSparsify is a sparsification strategy based on progressive weight magnitude pruning.
We show experimentally that FedSparsify learns a subnetwork of both high sparsity and learning performance.
arXiv Detail & Related papers (2022-04-26T16:45:53Z) - An Experimental Study of the Impact of Pre-training on the Pruning of a
Convolutional Neural Network [0.0]
In recent years, deep neural networks have known a wide success in various application domains.
Deep neural networks usually involve a large number of parameters, which correspond to the weights of the network.
The pruning methods notably attempt to reduce the size of the parameter set, by identifying and removing the irrelevant weights.
arXiv Detail & Related papers (2021-12-15T16:02:15Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Layer Folding: Neural Network Depth Reduction using Activation
Linearization [0.0]
Modern devices exhibit a high level of parallelism, but real-time latency is still highly dependent on networks' depth.
We propose a method that learns whether non-linear activations can be removed, allowing to fold consecutive linear layers into one.
We apply our method to networks pre-trained on CIFAR-10 and CIFAR-100 and find that they can all be transformed into shallower forms that share a similar depth.
arXiv Detail & Related papers (2021-06-17T08:22:46Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - Sparsity in Deep Learning: Pruning and growth for efficient inference
and training in neural networks [78.47459801017959]
Sparsity can reduce the memory footprint of regular networks to fit mobile devices.
We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice.
arXiv Detail & Related papers (2021-01-31T22:48:50Z) - ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image
Classification [49.87503122462432]
We introduce a novel neural network termed Relation-and-Margin learning Network (ReMarNet)
Our method assembles two networks of different backbones so as to learn the features that can perform excellently in both of the aforementioned two classification mechanisms.
Experiments on four image datasets demonstrate that our approach is effective in learning discriminative features from a small set of labeled samples.
arXiv Detail & Related papers (2020-06-27T13:50:20Z) - A Hybrid Method for Training Convolutional Neural Networks [3.172761915061083]
We propose a hybrid method that uses both backpropagation and evolutionary strategies to train Convolutional Neural Networks.
We show that the proposed hybrid method is capable of improving upon regular training in the task of image classification.
arXiv Detail & Related papers (2020-04-15T17:52:48Z) - Differentiable Sparsification for Deep Neural Networks [0.0]
We propose a fully differentiable sparsification method for deep neural networks.
The proposed method can learn both the sparsified structure and weights of a network in an end-to-end manner.
To the best of our knowledge, this is the first fully differentiable sparsification method.
arXiv Detail & Related papers (2019-10-08T03:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.