Lottery Pools: Winning More by Interpolating Tickets without Increasing
Training or Inference Cost
- URL: http://arxiv.org/abs/2208.10842v4
- Date: Tue, 4 Apr 2023 00:16:09 GMT
- Title: Lottery Pools: Winning More by Interpolating Tickets without Increasing
Training or Inference Cost
- Authors: Lu Yin, Shiwei Liu, Meng Fang, Tianjin Huang, Vlado Menkovski, Mykola
Pechenizkiy
- Abstract summary: Lottery tickets (LTs) is able to discover accurate and sparseworks that could be trained in isolation to match the performance of dense networks.
We show that our method achieves significant performance gains in both, in-distribution and out-of-distribution scenarios.
- Score: 28.70692607078139
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lottery tickets (LTs) is able to discover accurate and sparse subnetworks
that could be trained in isolation to match the performance of dense networks.
Ensemble, in parallel, is one of the oldest time-proven tricks in machine
learning to improve performance by combining the output of multiple independent
models. However, the benefits of ensemble in the context of LTs will be diluted
since ensemble does not directly lead to stronger sparse subnetworks, but
leverages their predictions for a better decision. In this work, we first
observe that directly averaging the weights of the adjacent learned subnetworks
significantly boosts the performance of LTs. Encouraged by this observation, we
further propose an alternative way to perform an 'ensemble' over the
subnetworks identified by iterative magnitude pruning via a simple
interpolating strategy. We call our method Lottery Pools. In contrast to the
naive ensemble which brings no performance gains to each single subnetwork,
Lottery Pools yields much stronger sparse subnetworks than the original LTs
without requiring any extra training or inference cost. Across various modern
architectures on CIFAR-10/100 and ImageNet, we show that our method achieves
significant performance gains in both, in-distribution and out-of-distribution
scenarios. Impressively, evaluated with VGG-16 and ResNet-18, the produced
sparse subnetworks outperform the original LTs by up to 1.88% on CIFAR-100 and
2.36% on CIFAR-100-C; the resulting dense network surpasses the pre-trained
dense-model up to 2.22% on CIFAR-100 and 2.38% on CIFAR-100-C.
Related papers
- Data-Efficient Double-Win Lottery Tickets from Robust Pre-training [129.85939347733387]
We introduce Double-Win Lottery Tickets, in which a subnetwork from a pre-trained model can be independently transferred on diverse downstream tasks.
We find that robust pre-training tends to craft sparser double-win lottery tickets with superior performance over the standard counterparts.
arXiv Detail & Related papers (2022-06-09T20:52:50Z) - Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask
Training [55.43088293183165]
Recent studies show that pre-trained language models (PLMs) like BERT contain matchingworks that have similar transfer learning performance as the original PLM.
In this paper, we find that the BERTworks have even more potential than these studies have shown.
We train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork.
arXiv Detail & Related papers (2022-04-24T08:42:47Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity.
We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting.
Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z) - PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition [78.67749936030219]
Prune-Adjust- Re-Prune (PARP) discovers and finetunesworks for much better ASR performance.
Experiments on low-resource English and multi-lingual ASR show sparseworks exist in pre-trained speech SSL.
arXiv Detail & Related papers (2021-06-10T17:32:25Z) - The Elastic Lottery Ticket Hypothesis [106.79387235014379]
Lottery Ticket Hypothesis raises keen attention to identifying sparse trainableworks or winning tickets.
The most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning.
We propose a variety of strategies to tweak the winning tickets found from different networks of the same model family.
arXiv Detail & Related papers (2021-03-30T17:53:45Z) - Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural
Networks by Pruning A Randomly Weighted Network [13.193734014710582]
We propose an algorithm for finding multi-prize tickets (MPTs) and test it by performing a series of experiments on CIFAR-10 and ImageNet datasets.
Our MPTs-1/32 not only set new binary weight network state-of-the-art (SOTA) Top-1 accuracy -- 94.8% on CIFAR-10 and 74.03% on ImageNet -- but also outperform their full-precision counterparts by 1.78% and 0.76%, respectively.
arXiv Detail & Related papers (2021-03-17T00:31:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.