The Lottery Ticket Hypothesis for Object Recognition
- URL: http://arxiv.org/abs/2012.04643v2
- Date: Mon, 19 Apr 2021 17:59:57 GMT
- Title: The Lottery Ticket Hypothesis for Object Recognition
- Authors: Sharath Girish, Shishira R. Maiya, Kamal Gupta, Hao Chen, Larry Davis,
Abhinav Shrivastava
- Abstract summary: Lottery Ticket Hypothesis states that deep networks trained on large datasets contain smaller neuralworks that achieve on par performance as the dense networks.
We provide guidance on how to find lottery tickets with up to 80% overall sparsity on different sub-tasks without incurring any drop in performance.
- Score: 39.186511997089575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recognition tasks, such as object recognition and keypoint estimation, have
seen widespread adoption in recent years. Most state-of-the-art methods for
these tasks use deep networks that are computationally expensive and have huge
memory footprints. This makes it exceedingly difficult to deploy these systems
on low power embedded devices. Hence, the importance of decreasing the storage
requirements and the amount of computation in such models is paramount. The
recently proposed Lottery Ticket Hypothesis (LTH) states that deep neural
networks trained on large datasets contain smaller subnetworks that achieve on
par performance as the dense networks. In this work, we perform the first
empirical study investigating LTH for model pruning in the context of object
detection, instance segmentation, and keypoint estimation. Our studies reveal
that lottery tickets obtained from ImageNet pretraining do not transfer well to
the downstream tasks. We provide guidance on how to find lottery tickets with
up to 80% overall sparsity on different sub-tasks without incurring any drop in
the performance. Finally, we analyse the behavior of trained tickets with
respect to various task attributes such as object size, frequency, and
difficulty of detection.
Related papers
- Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning [14.792099973449794]
We propose an algorithm to align the training dynamics of the sparse network with that of the dense one.
We show how the usually neglected data-dependent component in the NTK's spectrum can be taken into account.
Path eXclusion (PX) is able to find lottery tickets even at high sparsity levels.
arXiv Detail & Related papers (2024-06-03T22:19:42Z) - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - Quantifying lottery tickets under label noise: accuracy, calibration,
and complexity [6.232071870655069]
Pruning deep neural networks is a widely used strategy to alleviate the computational burden in machine learning.
We use the sparse double descent approach to identify univocally and characterise pruned models associated with classification tasks.
arXiv Detail & Related papers (2023-06-21T11:35:59Z) - LOFT: Finding Lottery Tickets through Filter-wise Training [15.06694204377327]
We show how one can efficiently identify the emergence of such winning tickets, and use this observation to design efficient pretraining algorithms.
We present the emphLOttery ticket through Filter-wise Training algorithm, dubbed as textscLoFT.
Experiments show that textscLoFT $i)$ preserves and finds good lottery tickets, while $ii)$ achieves it non-trivial and communication savings.
arXiv Detail & Related papers (2022-10-28T14:43:42Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Few-Shot Keypoint Detection as Task Adaptation via Latent Embeddings [17.04471874483516]
Existing approaches either compute dense keypoint embeddings in a single forward pass, or allocate their full capacity to a sparse set of points.
In this paper we explore a middle ground based on the observation that the number of relevant points at a given time are typically relatively few.
Our main contribution is a novel architecture, inspired by few-shot task adaptation, which allows a sparse-style network to condition on a keypoint embedding.
arXiv Detail & Related papers (2021-12-09T13:25:42Z) - FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training
with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin.
We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z) - Playing Lottery Tickets with Vision and Language [62.6420670250559]
Large-scale transformer-based pre-training has revolutionized vision-and-language (V+L) research.
In parallel, work on the lottery ticket hypothesis has shown that deep neural networks contain small matchingworks that can achieve on par or even better performance than the dense networks when trained in isolation.
We use UNITER, one of the best-performing V+L models, as the testbed, and consolidate 7 representative V+L tasks for experiments.
arXiv Detail & Related papers (2021-04-23T22:24:33Z) - Towards Practical Lottery Ticket Hypothesis for Adversarial Training [78.30684998080346]
We show there exists a subset of the aforementioned sub-networks that converge significantly faster during the training process.
As a practical application of our findings, we demonstrate that such sub-networks can help in cutting down the total time of adversarial training.
arXiv Detail & Related papers (2020-03-06T03:11:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.