Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets
- URL: http://arxiv.org/abs/2202.04736v1
- Date: Wed, 9 Feb 2022 21:33:51 GMT
- Title: Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets
- Authors: Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wang, Zhangyang Wang
- Abstract summary: Lottery ticket hypothesis (LTH) has shown that dense models contain highly sparseworks (i.e., winning tickets) that can be trained in isolation to match full accuracy.
In this paper, we demonstrate the first positive result that a structurally sparse winning ticket can be effectively found in general.
Specifically, we first "re-fill" pruned elements back in some channels deemed to be important, and then "re-group" non-zero elements to create flexible group-wise structural patterns.
- Score: 127.56361320894861
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The lottery ticket hypothesis (LTH) has shown that dense models contain
highly sparse subnetworks (i.e., winning tickets) that can be trained in
isolation to match full accuracy. Despite many exciting efforts being made,
there is one "commonsense" seldomly challenged: a winning ticket is found by
iterative magnitude pruning (IMP) and hence the resultant pruned subnetworks
have only unstructured sparsity. That gap limits the appeal of winning tickets
in practice, since the highly irregular sparse patterns are challenging to
accelerate on hardware. Meanwhile, directly substituting structured pruning for
unstructured pruning in IMP damages performance more severely and is usually
unable to locate winning tickets.
In this paper, we demonstrate the first positive result that a structurally
sparse winning ticket can be effectively found in general. The core idea is to
append "post-processing techniques" after each round of (unstructured) IMP, to
enforce the formation of structural sparsity. Specifically, we first "re-fill"
pruned elements back in some channels deemed to be important, and then
"re-group" non-zero elements to create flexible group-wise structural patterns.
Both our identified channel- and group-wise structural subnetworks win the
lottery, with substantial inference speedups readily supported by existing
hardware. Extensive experiments, conducted on diverse datasets across multiple
network backbones, consistently validate our proposal, showing that the
hardware acceleration roadblock of LTH is now removed. Specifically, the
structural winning tickets obtain up to {64.93%, 64.84%, 64.84%} running time
savings at {36% ~ 80%, 74%, 58%} sparsity on {CIFAR, Tiny-ImageNet, ImageNet},
while maintaining comparable accuracy. Codes are available in
https://github.com/VITA-Group/Structure-LTH.
Related papers
- Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery
Tickets from Large Models [106.19385911520652]
Lottery Ticket Hypothesis (LTH) and its variants have been exploited to prune large pre-trained models generating parameterworks.
LTH is enormously inhibited by repetitive full training and pruning routine of iterative magnitude pruning (IMP)
We propose Instant Soup Pruning (ISP) to generate lottery ticket quality IMPworks.
arXiv Detail & Related papers (2023-06-18T03:09:52Z) - When Layers Play the Lottery, all Tickets Win at Initialization [0.0]
Pruning is a technique for reducing the computational cost of deep networks.
In this work, we propose to discover winning tickets when the pruning process removes layers.
Our winning tickets notably speed up the training phase and reduce up to 51% of carbon emission.
arXiv Detail & Related papers (2023-01-25T21:21:15Z) - COLT: Cyclic Overlapping Lottery Tickets for Faster Pruning of
Convolutional Neural Networks [5.956029437413275]
This research aims to generate winning lottery tickets from a set of lottery tickets that can achieve similar accuracy to the original unpruned network.
We introduce a novel winning ticket called Cyclic Overlapping Lottery Ticket (COLT) by data splitting and cyclic retraining of the pruned network from scratch.
arXiv Detail & Related papers (2022-12-24T16:38:59Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Plant 'n' Seek: Can You Find the Winning Ticket? [6.85316573653194]
Lottery ticket hypothesis has sparked the rapid development of pruning algorithms that perform structure learning.
We hand-craft extremely sparse network topologies, plant them in large neural networks, and evaluate state-of-the-art lottery ticket pruning methods.
arXiv Detail & Related papers (2021-11-22T12:32:25Z) - Efficient Lottery Ticket Finding: Less Data is More [87.13642800792077]
Lottery ticket hypothesis (LTH) reveals existence of winning tickets (sparse but criticalworks) for dense networks.
Finding winning tickets requires burdensome computations in the train-prune-retrain process.
This paper explores a new perspective on finding lottery tickets more efficiently, by doing so only with a specially selected subset of data.
arXiv Detail & Related papers (2021-06-06T19:58:17Z) - The Elastic Lottery Ticket Hypothesis [106.79387235014379]
Lottery Ticket Hypothesis raises keen attention to identifying sparse trainableworks or winning tickets.
The most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning.
We propose a variety of strategies to tweak the winning tickets found from different networks of the same model family.
arXiv Detail & Related papers (2021-03-30T17:53:45Z) - Good Students Play Big Lottery Better [84.6111281091602]
Lottery ticket hypothesis suggests that a dense neural network contains a sparse sub-network that can match the test accuracy of the original dense net.
Recent studies demonstrate that a sparse sub-network can still be obtained by using a rewinding technique.
This paper proposes a new, simpler and yet powerful technique for re-training the sub-network, called "Knowledge Distillation ticket" (KD ticket)
arXiv Detail & Related papers (2021-01-08T23:33:53Z) - Bespoke vs. Pr\^et-\`a-Porter Lottery Tickets: Exploiting Mask
Similarity for Trainable Sub-Network Finding [0.913755431537592]
Lottery Tickets are sparse sub-networks within over-parametrized networks.
We propose a consensus-based method for generating refined lottery tickets.
We successfully train these sub-networks to performance comparable to that of ordinary lottery tickets.
arXiv Detail & Related papers (2020-07-06T22:48:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.