Related papers: When Layers Play the Lottery, all Tickets Win at Initialization

When Layers Play the Lottery, all Tickets Win at Initialization

URL: http://arxiv.org/abs/2301.10835v2
Date: Tue, 19 Mar 2024 14:08:25 GMT
Title: When Layers Play the Lottery, all Tickets Win at Initialization
Authors: Artur Jordao, George Correa de Araujo, Helena de Almeida Maia, Helio Pedrini,
Abstract summary: Pruning is a technique for reducing the computational cost of deep networks. In this work, we propose to discover winning tickets when the pruning process removes layers. Our winning tickets notably speed up the training phase and reduce up to 51% of carbon emission.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pruning is a standard technique for reducing the computational cost of deep networks. Many advances in pruning leverage concepts from the Lottery Ticket Hypothesis (LTH). LTH reveals that inside a trained dense network exists sparse subnetworks (tickets) able to achieve similar accuracy (i.e., win the lottery - winning tickets). Pruning at initialization focuses on finding winning tickets without training a dense network. Studies on these concepts share the trend that subnetworks come from weight or filter pruning. In this work, we investigate LTH and pruning at initialization from the lens of layer pruning. First, we confirm the existence of winning tickets when the pruning process removes layers. Leveraged by this observation, we propose to discover these winning tickets at initialization, eliminating the requirement of heavy computational resources for training the initial (over-parameterized) dense network. Extensive experiments show that our winning tickets notably speed up the training phase and reduce up to 51% of carbon emission, an important step towards democratization and green Artificial Intelligence. Beyond computational benefits, our winning tickets exhibit robustness against adversarial and out-of-distribution examples. Finally, we show that our subnetworks easily win the lottery at initialization while tickets from filter removal (the standard structured LTH) hardly become winning tickets.

Related papers

Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity. In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark. We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z)
Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets [127.56361320894861]
Lottery ticket hypothesis (LTH) has shown that dense models contain highly sparseworks (i.e., winning tickets) that can be trained in isolation to match full accuracy. In this paper, we demonstrate the first positive result that a structurally sparse winning ticket can be effectively found in general. Specifically, we first "re-fill" pruned elements back in some channels deemed to be important, and then "re-group" non-zero elements to create flexible group-wise structural patterns.
arXiv Detail & Related papers (2022-02-09T21:33:51Z)
Juvenile state hypothesis: What we can learn from lottery ticket hypothesis researches? [1.701869491238765]
Original lottery ticket hypothesis performs pruning and weight resetting after training convergence. We propose a strategy that combines the idea of neural network structure search with a pruning algorithm to alleviate this problem.
arXiv Detail & Related papers (2021-09-08T18:22:00Z)
FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin. We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z)
Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets Win [20.97456178983006]
Lottery ticket hypothesis states that sparseworks exist in randomly dense networks that can be trained to the same accuracy as the dense network they reside in. We show that by using a training method that is stable with respect to linear mode connectivity, large networks can also be entirely rewound to initialization.
arXiv Detail & Related papers (2021-06-13T10:06:06Z)
The Elastic Lottery Ticket Hypothesis [106.79387235014379]
Lottery Ticket Hypothesis raises keen attention to identifying sparse trainableworks or winning tickets. The most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning. We propose a variety of strategies to tweak the winning tickets found from different networks of the same model family.
arXiv Detail & Related papers (2021-03-30T17:53:45Z)
Good Students Play Big Lottery Better [84.6111281091602]
Lottery ticket hypothesis suggests that a dense neural network contains a sparse sub-network that can match the test accuracy of the original dense net. Recent studies demonstrate that a sparse sub-network can still be obtained by using a rewinding technique. This paper proposes a new, simpler and yet powerful technique for re-training the sub-network, called "Knowledge Distillation ticket" (KD ticket)
arXiv Detail & Related papers (2021-01-08T23:33:53Z)
Winning Lottery Tickets in Deep Generative Models [64.79920299421255]
We show the existence of winning tickets in deep generative models such as GANs and VAEs. We also demonstrate the transferability of winning tickets across different generative models.
arXiv Detail & Related papers (2020-10-05T21:45:39Z)
Drawing Early-Bird Tickets: Towards More Efficient Training of Deep Networks [82.52404247479359]
Early-bird (EB) tickets can be identified at the very early training stage. We propose a mask distance metric that can be used to identify EB tickets with low computational overhead.
arXiv Detail & Related papers (2019-09-26T07:43:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.