Related papers: Data-dependent Pruning to find the Winning Lottery Ticket

Data-dependent Pruning to find the Winning Lottery Ticket

URL: http://arxiv.org/abs/2006.14350v1
Date: Thu, 25 Jun 2020 12:48:34 GMT
Title: Data-dependent Pruning to find the Winning Lottery Ticket
Authors: D\'aniel L\'evai and Zsolt Zombori
Abstract summary: Lottery Ticket Hypothesis postulates that a freshly neural network contains a small subnetwork that can be trained to achieve similar performance as the full network. We conclude that incorporating a data dependent component into the pruning criterion consistently improves the performance of existing pruning algorithms.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Lottery Ticket Hypothesis postulates that a freshly initialized neural network contains a small subnetwork that can be trained in isolation to achieve similar performance as the full network. Our paper examines several alternatives to search for such subnetworks. We conclude that incorporating a data dependent component into the pruning criterion in the form of the gradient of the training loss -- as done in the SNIP method -- consistently improves the performance of existing pruning algorithms.

Related papers

Training of Spiking Neural Networks with Expectation-Propagation [9.24888258922809]
We propose a unifying message-passing framework for training spiking neural networks (SNNs)<n>Our gradient-free method is capable of learning the marginal distributions of network parameters and simultaneously marginalizes parameters, such as the outputs of hidden layers.
arXiv Detail & Related papers (2025-06-30T11:59:56Z)
Playing the Lottery With Concave Regularizers for Sparse Trainable Neural Networks [10.48836159692231]
We propose a novel class of methods to play the lottery. The key point is the use of concave regularization to promote the sparsity of a relaxed binary mask. We show that the proposed method can improve the performance of state-of-the-art algorithms.
arXiv Detail & Related papers (2025-01-19T18:05:13Z)
Sparsest Models Elude Pruning: An Exposé of Pruning's Current Capabilities [4.842973374883628]
Pruning has emerged as a promising approach for compressing large-scale models, yet its effectiveness in recovering the sparsest of models has not yet been explored. We conducted an extensive series of 485,838 experiments, applying a range of state-of-the-art pruning algorithms to a synthetic dataset we created, named the Cubist Spiral. Our findings reveal a significant gap in performance compared to ideal sparse networks, which we identified through a novel search algorithm.
arXiv Detail & Related papers (2024-07-04T17:33:15Z)
Cross-Inferential Networks for Source-free Unsupervised Domain Adaptation [17.718392065388503]
We propose to explore a new method called cross-inferential networks (CIN) Our main idea is that, when we adapt the network model to predict the sample labels from encoded features, we use these prediction results to construct new training samples with derived labels. Our experimental results on benchmark datasets demonstrate that our proposed CIN approach can significantly improve the performance of source-free UDA.
arXiv Detail & Related papers (2023-06-29T14:04:24Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z)
Learning from Data with Noisy Labels Using Temporal Self-Ensemble [11.245833546360386]
Deep neural networks (DNNs) have an enormous capacity to memorize noisy labels. Current state-of-the-art methods present a co-training scheme that trains dual networks using samples associated with small losses. We propose a simple yet effective robust training scheme that operates by training only a single network.
arXiv Detail & Related papers (2022-07-21T08:16:31Z)
Effective Out-of-Distribution Detection in Classifier Based on PEDCC-Loss [5.614122064282257]
We propose an effective algorithm for detecting out-of-distribution examples utilizing PEDCC-Loss. We mathematically analyze the nature of the confidence score output by the PEDCC (Predefined Evenly-Distribution Class Centroids) classifier. We then construct a more effective scoring function to distinguish in-distribution (ID) and out-of-distribution.
arXiv Detail & Related papers (2022-04-10T11:47:29Z)
Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity. In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark. We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z)
Local Critic Training for Model-Parallel Learning of Deep Neural Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training. We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z)
Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot [55.37967301483917]
Conventional wisdom of pruning algorithms suggests that pruning methods exploit information from training data to find goodworks. In this paper, we conduct sanity checks for the above beliefs on several recent unstructured pruning methods. We propose a series of simple emphdata-independent prune ratios for each layer, and randomly prune each layer accordingly to get a subnetwork.
arXiv Detail & Related papers (2020-09-22T17:36:17Z)
Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks. With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)
Towards Practical Lottery Ticket Hypothesis for Adversarial Training [78.30684998080346]
We show there exists a subset of the aforementioned sub-networks that converge significantly faster during the training process. As a practical application of our findings, we demonstrate that such sub-networks can help in cutting down the total time of adversarial training.
arXiv Detail & Related papers (2020-03-06T03:11:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.