Superposing Many Tickets into One: A Performance Booster for Sparse
Neural Network Training
- URL: http://arxiv.org/abs/2205.15322v1
- Date: Mon, 30 May 2022 16:01:32 GMT
- Title: Superposing Many Tickets into One: A Performance Booster for Sparse
Neural Network Training
- Authors: Lu Yin, Vlado Menkovski, Meng Fang, Tianjin Huang, Yulong Pei, Mykola
Pechenizkiy, Decebal Constantin Mocanu, Shiwei Liu
- Abstract summary: We present a novel sparse training approach, termed textbfSup-tickets, which can satisfy two desiderata concurrently in a single sparse-to-sparse training process.
Across various modern architectures on CIFAR-10/100 and ImageNet, we show that Sup-tickets integrates seamlessly with the existing sparse training methods.
- Score: 32.30355584300427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works on sparse neural network training (sparse training) have shown
that a compelling trade-off between performance and efficiency can be achieved
by training intrinsically sparse neural networks from scratch. Existing sparse
training methods usually strive to find the best sparse subnetwork possible in
one single run, without involving any expensive dense or pre-training steps.
For instance, dynamic sparse training (DST), as one of the most prominent
directions, is capable of reaching a competitive performance of dense training
by iteratively evolving the sparse topology during the course of training. In
this paper, we argue that it is better to allocate the limited resources to
create multiple low-loss sparse subnetworks and superpose them into a stronger
one, instead of allocating all resources entirely to find an individual
subnetwork. To achieve this, two desiderata are required: (1) efficiently
producing many low-loss subnetworks, the so-called cheap tickets, within one
training process limited to the standard training time used in dense training;
(2) effectively superposing these cheap tickets into one stronger subnetwork
without going over the constrained parameter budget. To corroborate our
conjecture, we present a novel sparse training approach, termed
\textbf{Sup-tickets}, which can satisfy the above two desiderata concurrently
in a single sparse-to-sparse training process. Across various modern
architectures on CIFAR-10/100 and ImageNet, we show that Sup-tickets integrates
seamlessly with the existing sparse training methods and demonstrates
consistent performance improvement.
Related papers
- Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable.
In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols.
Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity.
We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting.
Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z) - FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training
with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin.
We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z) - Simultaneous Training of Partially Masked Neural Networks [67.19481956584465]
We show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance.
We show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone.
arXiv Detail & Related papers (2021-06-16T15:57:51Z) - Selfish Sparse RNN Training [13.165729746380816]
We propose an approach to train sparse RNNs with a fixed parameter count in one single run, without compromising performance.
We achieve state-of-the-art sparse training results with various datasets on Penn TreeBank and Wikitext-2.
arXiv Detail & Related papers (2021-01-22T10:45:40Z) - Towards Practical Lottery Ticket Hypothesis for Adversarial Training [78.30684998080346]
We show there exists a subset of the aforementioned sub-networks that converge significantly faster during the training process.
As a practical application of our findings, we demonstrate that such sub-networks can help in cutting down the total time of adversarial training.
arXiv Detail & Related papers (2020-03-06T03:11:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.