FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training
with Dynamic Sparsity
- URL: http://arxiv.org/abs/2106.14568v1
- Date: Mon, 28 Jun 2021 10:48:20 GMT
- Title: FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training
with Dynamic Sparsity
- Authors: Shiwei Liu, Tianlong Chen, Zahra Atashgahi, Xiaohan Chen, Ghada Sokar,
Elena Mocanu, Mykola Pechenizkiy, Zhangyang Wang, Decebal Constantin Mocanu
- Abstract summary: We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin.
We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
- Score: 74.58777701536668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works on sparse neural networks have demonstrated that it is possible
to train a sparse network in isolation to match the performance of the
corresponding dense networks with a fraction of parameters. However, the
identification of these performant sparse neural networks (winning tickets)
either involves a costly iterative train-prune-retrain process (e.g., Lottery
Ticket Hypothesis) or an over-extended sparse training time (e.g., Training
with Dynamic Sparsity), both of which would raise financial and environmental
concerns. In this work, we attempt to address this cost-reducing problem by
introducing the FreeTickets concept, as the first solution which can boost the
performance of sparse convolutional neural networks over their dense network
equivalents by a large margin, while using for complete training only a
fraction of the computational resources required by the latter. Concretely, we
instantiate the FreeTickets concept, by proposing two novel efficient ensemble
methods with dynamic sparsity, which yield in one shot many diverse and
accurate tickets "for free" during the sparse training process. The combination
of these free tickets into an ensemble demonstrates a significant improvement
in accuracy, uncertainty estimation, robustness, and efficiency over the
corresponding dense (ensemble) networks. Our results provide new insights into
the strength of sparse neural networks and suggest that the benefits of
sparsity go way beyond the usual training/inference expected efficiency. We
will release all codes in https://github.com/Shiweiliuiiiiiii/FreeTickets.
Related papers
- Learning from Data with Noisy Labels Using Temporal Self-Ensemble [11.245833546360386]
Deep neural networks (DNNs) have an enormous capacity to memorize noisy labels.
Current state-of-the-art methods present a co-training scheme that trains dual networks using samples associated with small losses.
We propose a simple yet effective robust training scheme that operates by training only a single network.
arXiv Detail & Related papers (2022-07-21T08:16:31Z) - Superposing Many Tickets into One: A Performance Booster for Sparse
Neural Network Training [32.30355584300427]
We present a novel sparse training approach, termed textbfSup-tickets, which can satisfy two desiderata concurrently in a single sparse-to-sparse training process.
Across various modern architectures on CIFAR-10/100 and ImageNet, we show that Sup-tickets integrates seamlessly with the existing sparse training methods.
arXiv Detail & Related papers (2022-05-30T16:01:32Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity.
We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting.
Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z) - The Unreasonable Effectiveness of Random Pruning: Return of the Most
Naive Baseline for Sparse Training [111.15069968583042]
Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training.
We empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent.
Our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning.
arXiv Detail & Related papers (2022-02-05T21:19:41Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Truly Sparse Neural Networks at Scale [2.2860412844991655]
We train the largest neural network ever trained in terms of representational power -- reaching the bat brain size.
Our approach has state-of-the-art performance while opening the path for an environmentally friendly artificial intelligence era.
arXiv Detail & Related papers (2021-02-02T20:06:47Z) - Selfish Sparse RNN Training [13.165729746380816]
We propose an approach to train sparse RNNs with a fixed parameter count in one single run, without compromising performance.
We achieve state-of-the-art sparse training results with various datasets on Penn TreeBank and Wikitext-2.
arXiv Detail & Related papers (2021-01-22T10:45:40Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.