Lottery Jackpots Exist in Pre-trained Models
- URL: http://arxiv.org/abs/2104.08700v7
- Date: Sat, 2 Sep 2023 05:09:41 GMT
- Title: Lottery Jackpots Exist in Pre-trained Models
- Authors: Yuxin Zhang, Mingbao Lin, Yunshan Zhong, Fei Chao, Rongrong Ji
- Abstract summary: We show that high-performing and sparse sub-networks without the involvement of weight training, termed "lottery jackpots", exist in pre-trained models with unexpanded width.
We propose a novel short restriction method to restrict change of masks that may have potential negative impacts on the training loss.
- Score: 69.17690253938211
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Network pruning is an effective approach to reduce network complexity with
acceptable performance compromise. Existing studies achieve the sparsity of
neural networks via time-consuming weight training or complex searching on
networks with expanded width, which greatly limits the applications of network
pruning. In this paper, we show that high-performing and sparse sub-networks
without the involvement of weight training, termed "lottery jackpots", exist in
pre-trained models with unexpanded width. Furthermore, we improve the
efficiency for searching lottery jackpots from two perspectives. Firstly, we
observe that the sparse masks derived from many existing pruning criteria have
a high overlap with the searched mask of our lottery jackpot, among which, the
magnitude-based pruning results in the most similar mask with ours.
Consequently, our searched lottery jackpot removes 90% weights in ResNet-50,
while it easily obtains more than 70% top-1 accuracy using only 5 searching
epochs on ImageNet. In compliance with this insight, we initialize our sparse
mask using the magnitude-based pruning, resulting in at least 3x cost reduction
on the lottery jackpot searching while achieving comparable or even better
performance. Secondly, we conduct an in-depth analysis of the searching process
for lottery jackpots. Our theoretical result suggests that the decrease in
training loss during weight searching can be disturbed by the dependency
between weights in modern networks. To mitigate this, we propose a novel short
restriction method to restrict change of masks that may have potential negative
impacts on the training loss. Our code is available at
https://github.com/zyxxmu/lottery-jackpots.
Related papers
- Can We Find Strong Lottery Tickets in Generative Models? [24.405555822170896]
We find strong lottery tickets in generative models that achieve good generative performance without any weight update.
To the best of our knowledge, we are the first to show the existence of strong lottery tickets in generative models and provide an algorithm to find it.
arXiv Detail & Related papers (2022-12-16T07:20:28Z) - Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable.
In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols.
Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - You are caught stealing my winning lottery ticket! Making a lottery
ticket claim its ownership [87.13642800792077]
Lottery ticket hypothesis (LTH) emerges as a promising framework to leverage a special sparse subnetwork.
Main resource bottleneck of LTH is however the extraordinary cost to find the sparse mask of the winning ticket.
Our setting adds a new dimension to the recently soaring interest in protecting against the intellectual property infringement of deep models.
arXiv Detail & Related papers (2021-10-30T03:38:38Z) - Good Students Play Big Lottery Better [84.6111281091602]
Lottery ticket hypothesis suggests that a dense neural network contains a sparse sub-network that can match the test accuracy of the original dense net.
Recent studies demonstrate that a sparse sub-network can still be obtained by using a rewinding technique.
This paper proposes a new, simpler and yet powerful technique for re-training the sub-network, called "Knowledge Distillation ticket" (KD ticket)
arXiv Detail & Related papers (2021-01-08T23:33:53Z) - Greedy Optimization Provably Wins the Lottery: Logarithmic Number of
Winning Tickets is Enough [19.19644194006565]
We show how much we can prune a neural network given a specified tolerance of accuracy drop.
The proposed method has the guarantee that the discrepancy between the pruned network and the original network decays with exponentially fast rate.
Empirically, our method improves prior arts on pruning various network architectures including ResNet, MobilenetV2/V3 on ImageNet.
arXiv Detail & Related papers (2020-10-29T22:06:31Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.