LOFT: Finding Lottery Tickets through Filter-wise Training
- URL: http://arxiv.org/abs/2210.16169v1
- Date: Fri, 28 Oct 2022 14:43:42 GMT
- Title: LOFT: Finding Lottery Tickets through Filter-wise Training
- Authors: Qihan Wang, Chen Dun, Fangshuo Liao, Chris Jermaine, Anastasios
Kyrillidis
- Abstract summary: We show how one can efficiently identify the emergence of such winning tickets, and use this observation to design efficient pretraining algorithms.
We present the emphLOttery ticket through Filter-wise Training algorithm, dubbed as textscLoFT.
Experiments show that textscLoFT $i)$ preserves and finds good lottery tickets, while $ii)$ achieves it non-trivial and communication savings.
- Score: 15.06694204377327
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work on the Lottery Ticket Hypothesis (LTH) shows that there exist
``\textit{winning tickets}'' in large neural networks. These tickets represent
``sparse'' versions of the full model that can be trained independently to
achieve comparable accuracy with respect to the full model. However, finding
the winning tickets requires one to \emph{pretrain} the large model for at
least a number of epochs, which can be a burdensome task, especially when the
original neural network gets larger.
In this paper, we explore how one can efficiently identify the emergence of
such winning tickets, and use this observation to design efficient pretraining
algorithms. For clarity of exposition, our focus is on convolutional neural
networks (CNNs). To identify good filters, we propose a novel filter distance
metric that well-represents the model convergence. As our theory dictates, our
filter analysis behaves consistently with recent findings of neural network
learning dynamics. Motivated by these observations, we present the
\emph{LOttery ticket through Filter-wise Training} algorithm, dubbed as
\textsc{LoFT}. \textsc{LoFT} is a model-parallel pretraining algorithm that
partitions convolutional layers by filters to train them independently in a
distributed setting, resulting in reduced memory and communication costs during
pretraining. Experiments show that \textsc{LoFT} $i)$ preserves and finds good
lottery tickets, while $ii)$ it achieves non-trivial computation and
communication savings, and maintains comparable or even better accuracy than
other pretraining methods.
Related papers
- Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning [14.792099973449794]
We propose an algorithm to align the training dynamics of the sparse network with that of the dense one.
We show how the usually neglected data-dependent component in the NTK's spectrum can be taken into account.
Path eXclusion (PX) is able to find lottery tickets even at high sparsity levels.
arXiv Detail & Related papers (2024-06-03T22:19:42Z) - No Free Prune: Information-Theoretic Barriers to Pruning at Initialization [8.125999058340998]
We show the Law of Robustness of arXiv:2105.12806 extends to sparse networks with the usual parameter count replaced by $p_texteff$.
Experiments on neural networks confirm that information gained during training may indeed affect model capacity.
arXiv Detail & Related papers (2024-02-02T01:13:16Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - A quantum algorithm for training wide and deep classical neural networks [72.2614468437919]
We show that conditions amenable to classical trainability via gradient descent coincide with those necessary for efficiently solving quantum linear systems.
We numerically demonstrate that the MNIST image dataset satisfies such conditions.
We provide empirical evidence for $O(log n)$ training of a convolutional neural network with pooling.
arXiv Detail & Related papers (2021-07-19T23:41:03Z) - FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training
with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin.
We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z) - The Elastic Lottery Ticket Hypothesis [106.79387235014379]
Lottery Ticket Hypothesis raises keen attention to identifying sparse trainableworks or winning tickets.
The most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning.
We propose a variety of strategies to tweak the winning tickets found from different networks of the same model family.
arXiv Detail & Related papers (2021-03-30T17:53:45Z) - Lottery Ticket Implies Accuracy Degradation, Is It a Desirable
Phenomenon? [43.47794674403988]
In deep model compression, the recent finding "Lottery Ticket Hypothesis" (LTH) (Frankle & Carbin) pointed out that there could exist a winning ticket.
We investigate the underlying condition and rationale behind the winning property, and find that the underlying reason is largely attributed to the correlation between weights and final-trained weights.
We propose the "pruning & fine-tuning" method that consistently outperforms lottery ticket sparse training.
arXiv Detail & Related papers (2021-02-19T14:49:46Z) - Good Students Play Big Lottery Better [84.6111281091602]
Lottery ticket hypothesis suggests that a dense neural network contains a sparse sub-network that can match the test accuracy of the original dense net.
Recent studies demonstrate that a sparse sub-network can still be obtained by using a rewinding technique.
This paper proposes a new, simpler and yet powerful technique for re-training the sub-network, called "Knowledge Distillation ticket" (KD ticket)
arXiv Detail & Related papers (2021-01-08T23:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.