Related papers: The EarlyBird Gets the WORM: Heuristically Accelerating EarlyBird Convergence

The EarlyBird Gets the WORM: Heuristically Accelerating EarlyBird Convergence

URL: http://arxiv.org/abs/2406.11872v1
Date: Fri, 31 May 2024 05:13:02 GMT
Title: The EarlyBird Gets the WORM: Heuristically Accelerating EarlyBird Convergence
Authors: Adithya Vasudev,
Abstract summary: Early Bird hypothesis proposes an efficient algorithm to find winning lottery tickets in convolutional neural networks. We propose WORM, a method that exploits static groups by truncating their gradients, forcing the model to rely on other neurons. Experiments show WORM achieves faster ticket identification training and uses fewer FLOPs, despite the additional computational overhead.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The Lottery Ticket hypothesis proposes that ideal sparse subnetworks called lottery tickets exist in the untrained dense network. The Early Bird hypothesis proposes an efficient algorithm to find these winning lottery tickets in convolutional neural networks using the novel concept of distance between subnetworks to detect convergence in the subnetworks of a model. However, this approach overlooks unchanging groups of unimportant neurons near the end of the search. We propose WORM, a method that exploits these static groups by truncating their gradients, forcing the model to rely on other neurons. Experiments show WORM achieves faster ticket identification training and uses fewer FLOPs, despite the additional computational overhead. Additionally WORM pruned models lose less accuracy during pruning and recover accuracy faster, improving the robustness of the model. Furthermore, WORM is also able to generalize the Early Bird hypothesis reasonably well to larger models such as transformers, displaying its flexibility to adapt to various architectures.

Related papers

Successfully Applying Lottery Ticket Hypothesis to Diffusion Model [15.910383121581065]
Lottery Ticket Hypothesis claims that there exists winning tickets that can achieve performance competitive to the original dense neural network when trained in isolation. We empirically findworks at sparsity 90%-99% without compromising performance for denoising diffusion probabilistic models on benchmarks. Our method can find sparser sub-models that require less memory for storage and reduce the necessary number of FLOPs.
arXiv Detail & Related papers (2023-10-28T21:09:50Z)
Improving Out-of-Distribution Generalization of Neural Rerankers with Contextualized Late Interaction [52.63663547523033]
Late interaction, the simplest form of multi-vector, is also helpful to neural rerankers that only use the [] vector to compute the similarity score. We show that the finding is consistent across different model sizes and first-stage retrievers of diverse natures.
arXiv Detail & Related papers (2023-02-13T18:42:17Z)
LOFT: Finding Lottery Tickets through Filter-wise Training [15.06694204377327]
We show how one can efficiently identify the emergence of such winning tickets, and use this observation to design efficient pretraining algorithms. We present the emphLOttery ticket through Filter-wise Training algorithm, dubbed as textscLoFT. Experiments show that textscLoFT $i)$ preserves and finds good lottery tickets, while $ii)$ achieves it non-trivial and communication savings.
arXiv Detail & Related papers (2022-10-28T14:43:42Z)
Not All Lotteries Are Made Equal [0.0]
This work investigates the relation between model size and the ease of finding these sparse sub-networks. We show through experiments that, surprisingly, under a finite budget, smaller models benefit more from Ticket Search (TS)
arXiv Detail & Related papers (2022-06-16T13:41:36Z)
Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity. In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark. We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
The Elastic Lottery Ticket Hypothesis [106.79387235014379]
Lottery Ticket Hypothesis raises keen attention to identifying sparse trainableworks or winning tickets. The most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning. We propose a variety of strategies to tweak the winning tickets found from different networks of the same model family.
arXiv Detail & Related papers (2021-03-30T17:53:45Z)
Good Students Play Big Lottery Better [84.6111281091602]
Lottery ticket hypothesis suggests that a dense neural network contains a sparse sub-network that can match the test accuracy of the original dense net. Recent studies demonstrate that a sparse sub-network can still be obtained by using a rewinding technique. This paper proposes a new, simpler and yet powerful technique for re-training the sub-network, called "Knowledge Distillation ticket" (KD ticket)
arXiv Detail & Related papers (2021-01-08T23:33:53Z)
Winning Lottery Tickets in Deep Generative Models [64.79920299421255]
We show the existence of winning tickets in deep generative models such as GANs and VAEs. We also demonstrate the transferability of winning tickets across different generative models.
arXiv Detail & Related papers (2020-10-05T21:45:39Z)
PushNet: Efficient and Adaptive Neural Message Passing [1.9121961872220468]
Message passing neural networks have recently evolved into a state-of-the-art approach to representation learning on graphs. Existing methods perform synchronous message passing along all edges in multiple subsequent rounds. We consider a novel asynchronous message passing approach where information is pushed only along the most relevant edges until convergence.
arXiv Detail & Related papers (2020-03-04T18:15:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.