ELSA: Partial Weight Freezing for Overhead-Free Sparse Network
Deployment
- URL: http://arxiv.org/abs/2312.06872v2
- Date: Sun, 17 Dec 2023 15:38:51 GMT
- Title: ELSA: Partial Weight Freezing for Overhead-Free Sparse Network
Deployment
- Authors: Paniz Halvachi, Alexandra Peste, Dan Alistarh, Christoph H. Lampert
- Abstract summary: We present ELSA, a practical solution for creating deep networks that can easily be deployed at different levels of sparsity.
The core idea is to embed one or more sparse networks within a single dense network as a proper subset of the weights.
At prediction time, any sparse model can be extracted effortlessly simply be zeroing out weights according to a predefined mask.
- Score: 95.04504362111314
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present ELSA, a practical solution for creating deep networks that can
easily be deployed at different levels of sparsity. The core idea is to embed
one or more sparse networks within a single dense network as a proper subset of
the weights. At prediction time, any sparse model can be extracted effortlessly
simply be zeroing out weights according to a predefined mask. ELSA is simple,
powerful and highly flexible. It can use essentially any existing technique for
network sparsification and network training. In particular, it does not
restrict the loss function, architecture or the optimization technique. Our
experiments show that ELSA's advantages of flexible deployment comes with no or
just a negligible reduction in prediction quality compared to the standard way
of using multiple sparse networks that are trained and stored independently.
Related papers
- Network Fission Ensembles for Low-Cost Self-Ensembles [20.103367702014474]
We propose a low-cost ensemble learning and inference, called Network Fission Ensembles (NFE)
We first prune some of the weights to reduce the training burden.
We then group the remaining weights into several sets and create multiple auxiliary paths using each set to construct multi-exits.
arXiv Detail & Related papers (2024-08-05T08:23:59Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable.
In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols.
Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z) - Fast Conditional Network Compression Using Bayesian HyperNetworks [54.06346724244786]
We introduce a conditional compression problem and propose a fast framework for tackling it.
The problem is how to quickly compress a pretrained large neural network into optimal smaller networks given target contexts.
Our methods can quickly generate compressed networks with significantly smaller sizes than baseline methods.
arXiv Detail & Related papers (2022-05-13T00:28:35Z) - Extracting Effective Subnetworks with Gumebel-Softmax [9.176056742068813]
We devise an alternative pruning method that allows extracting effective pruningworks from larger untrained ones.
Our method is explored and extractsworks by exploring different topologies which are sampled using Gumbel Softmax.
The resultingworks are further enhanced using a highly efficient rescaling mechanism that reduces training time and improves performances.
arXiv Detail & Related papers (2022-02-25T21:31:30Z) - The Unreasonable Effectiveness of Random Pruning: Return of the Most
Naive Baseline for Sparse Training [111.15069968583042]
Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training.
We empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent.
Our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning.
arXiv Detail & Related papers (2022-02-05T21:19:41Z) - Automatic Sparse Connectivity Learning for Neural Networks [4.875787559251317]
Well-designed sparse neural networks have the potential to significantly reduce FLOPs and computational resources.
In this work, we propose a new automatic pruning method - Sparse Connectivity Learning.
Deep learning models trained by SCL outperform the SOTA human-designed and automatic pruning methods in sparsity, accuracy, and FLOPs reduction.
arXiv Detail & Related papers (2022-01-13T15:12:48Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - Sparsity-Control Ternary Weight Networks [34.00378876525579]
We focus on training ternary weight -1, 0, +1 networks which can avoid multiplications and dramatically reduce the memory and requirements.
Existing approaches to training ternary weight networks cannot control the sparsity of the ternary weights.
We propose the first sparsity-control approach (SCA) to training ternary weight networks.
arXiv Detail & Related papers (2020-11-01T18:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.