Related papers: Playing the Lottery With Concave Regularizers for Sparse Trainable Neural Networks

Playing the Lottery With Concave Regularizers for Sparse Trainable Neural Networks

URL: http://arxiv.org/abs/2501.11135v1
Date: Sun, 19 Jan 2025 18:05:13 GMT
Title: Playing the Lottery With Concave Regularizers for Sparse Trainable Neural Networks
Authors: Giulia Fracastoro, Sophie M. Fosson, Andrea Migliorati, Giuseppe C. Calafiore,
Abstract summary: We propose a novel class of methods to play the lottery.<n>The key point is the use of concave regularization to promote the sparsity of a relaxed binary mask.<n>We show that the proposed method can improve the performance of state-of-the-art algorithms.
Score: 10.48836159692231
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The design of sparse neural networks, i.e., of networks with a reduced number of parameters, has been attracting increasing research attention in the last few years. The use of sparse models may significantly reduce the computational and storage footprint in the inference phase. In this context, the lottery ticket hypothesis (LTH) constitutes a breakthrough result, that addresses not only the performance of the inference phase, but also of the training phase. It states that it is possible to extract effective sparse subnetworks, called winning tickets, that can be trained in isolation. The development of effective methods to play the lottery, i.e., to find winning tickets, is still an open problem. In this article, we propose a novel class of methods to play the lottery. The key point is the use of concave regularization to promote the sparsity of a relaxed binary mask, which represents the network topology. We theoretically analyze the effectiveness of the proposed method in the convex framework. Then, we propose extended numerical tests on various datasets and architectures, that show that the proposed method can improve the performance of state-of-the-art algorithms.

Related papers

Find A Winning Sign: Sign Is All We Need to Win the Lottery [52.63674911541416]
We show that a sparse network trained by an existing IP method can retain its basin of attraction if its parameter signs and normalization layer parameters are preserved. To take a step closer to finding a winning ticket, we alleviate the reliance on normalization layer parameters by preventing high error barriers along the linear path between the sparse network trained by our method and its counterpart with normalization layer parameters.
arXiv Detail & Related papers (2025-04-07T09:30:38Z)
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck [35.6883212537938]
We consider offline sparse parity learning, a supervised classification problem which admits a statistical query lower bound for gradient-based training of a multilayer perceptron. We show, theoretically and experimentally, that sparse initialization and increasing network width yield significant improvements in sample efficiency in this setting. We also show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning.
arXiv Detail & Related papers (2023-09-07T15:52:48Z)
The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples. In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z)
LOFT: Finding Lottery Tickets through Filter-wise Training [15.06694204377327]
We show how one can efficiently identify the emergence of such winning tickets, and use this observation to design efficient pretraining algorithms. We present the emphLOttery ticket through Filter-wise Training algorithm, dubbed as textscLoFT. Experiments show that textscLoFT $i)$ preserves and finds good lottery tickets, while $ii)$ achieves it non-trivial and communication savings.
arXiv Detail & Related papers (2022-10-28T14:43:42Z)
Why Random Pruning Is All We Need to Start Sparse [7.648170881733381]
Random masks define surprisingly effective sparse neural network models. We show that sparser networks can compete with dense architectures and state-of-the-art lottery ticket pruning algorithms.
arXiv Detail & Related papers (2022-10-05T17:34:04Z)
Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity. In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark. We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z)
Plant 'n' Seek: Can You Find the Winning Ticket? [6.85316573653194]
Lottery ticket hypothesis has sparked the rapid development of pruning algorithms that perform structure learning. We hand-craft extremely sparse network topologies, plant them in large neural networks, and evaluate state-of-the-art lottery ticket pruning methods.
arXiv Detail & Related papers (2021-11-22T12:32:25Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
Juvenile state hypothesis: What we can learn from lottery ticket hypothesis researches? [1.701869491238765]
Original lottery ticket hypothesis performs pruning and weight resetting after training convergence. We propose a strategy that combines the idea of neural network structure search with a pruning algorithm to alleviate this problem.
arXiv Detail & Related papers (2021-09-08T18:22:00Z)
How much pre-training is enough to discover a good subnetwork? [10.699603774240853]
We mathematically analyze the amount of dense network pre-training needed for a pruned network to perform well. We find a simple theoretical bound in the number of gradient descent pre-training iterations on a two-layer, fully-connected network. Experiments with larger datasets require more pre-training forworks obtained via pruning to perform well.
arXiv Detail & Related papers (2021-07-31T15:08:36Z)
FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin. We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z)
ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks. Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z)
Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks. With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.