Signing the Supermask: Keep, Hide, Invert
- URL: http://arxiv.org/abs/2201.13361v1
- Date: Mon, 31 Jan 2022 17:17:37 GMT
- Title: Signing the Supermask: Keep, Hide, Invert
- Authors: Nils Koster, Oliver Grothe and Achim Rettinger
- Abstract summary: We present a novel approach that either drops a neural network's initial weights or inverts their respective sign.
We achieve a pruning rate of up to 99%, while still matching or exceeding the performance of various baseline and previous models.
- Score: 0.9475039534437331
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The exponential growth in numbers of parameters of neural networks over the
past years has been accompanied by an increase in performance across several
fields. However, due to their sheer size, the networks not only became
difficult to interpret but also problematic to train and use in real-world
applications, since hardware requirements increased accordingly. Tackling both
issues, we present a novel approach that either drops a neural network's
initial weights or inverts their respective sign. Put simply, a network is
trained by weight selection and inversion without changing their absolute
values. Our contribution extends previous work on masking by additionally
sign-inverting the initial weights and follows the findings of the Lottery
Ticket Hypothesis. Through this extension and adaptations of initialization
methods, we achieve a pruning rate of up to 99%, while still matching or
exceeding the performance of various baseline and previous models. Our approach
has two main advantages. First, and most notable, signed Supermask models
drastically simplify a model's structure, while still performing well on given
tasks. Second, by reducing the neural network to its very foundation, we gain
insights into which weights matter for performance.
Related papers
- Neural Metamorphosis [72.88137795439407]
This paper introduces a new learning paradigm termed Neural Metamorphosis (NeuMeta), which aims to build self-morphable neural networks.
NeuMeta directly learns the continuous weight manifold of neural networks.
It sustains full-size performance even at a 75% compression rate.
arXiv Detail & Related papers (2024-10-10T14:49:58Z) - Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - An Initialization Schema for Neuronal Networks on Tabular Data [0.9155684383461983]
We show that a binomial neural network can be used effectively on tabular data.
The proposed approach shows a simple but effective approach for initializing the first hidden layer in neural networks.
We evaluate our approach on multiple public datasets and showcase the improved performance compared to other neural network-based approaches.
arXiv Detail & Related papers (2023-11-07T13:52:35Z) - MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters [19.358670728803336]
Most deep neural networks are trained under fixed network architectures and require retraining when the architecture changes.
To avoid this, one can grow from a small network by adding random weights over time to gradually achieve the target network size.
This naive approach falls short in practice as it brings too much noise to the growing process.
arXiv Detail & Related papers (2023-11-07T11:37:08Z) - Weight Compander: A Simple Weight Reparameterization for Regularization [5.744133015573047]
We introduce weight compander, a novel effective method to improve generalization of deep neural networks.
We show experimentally that using weight compander in addition to standard regularization methods improves the performance of neural networks.
arXiv Detail & Related papers (2023-06-29T14:52:04Z) - IF2Net: Innately Forgetting-Free Networks for Continual Learning [49.57495829364827]
Continual learning can incrementally absorb new concepts without interfering with previously learned knowledge.
Motivated by the characteristics of neural networks, we investigated how to design an Innately Forgetting-Free Network (IF2Net)
IF2Net allows a single network to inherently learn unlimited mapping rules without telling task identities at test time.
arXiv Detail & Related papers (2023-06-18T05:26:49Z) - Random Weights Networks Work as Loss Prior Constraint for Image
Restoration [50.80507007507757]
We present our belief Random Weights Networks can be Acted as Loss Prior Constraint for Image Restoration''
Our belief can be directly inserted into existing networks without any training and testing computational cost.
To emphasize, our main focus is to spark the realms of loss function and save their current neglected status.
arXiv Detail & Related papers (2023-03-29T03:43:51Z) - Parameter-Efficient Masking Networks [61.43995077575439]
Advanced network designs often contain a large number of repetitive structures (e.g., Transformer)
In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning masks.
It leads to a new paradigm for model compression to diminish the model size.
arXiv Detail & Related papers (2022-10-13T03:39:03Z) - Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.