Related papers: Robust Pruning at Initialization

Robust Pruning at Initialization

URL: http://arxiv.org/abs/2002.08797v5
Date: Wed, 19 May 2021 22:43:36 GMT
Title: Robust Pruning at Initialization
Authors: Soufiane Hayou, Jean-Francois Ton, Arnaud Doucet, Yee Whye Teh
Abstract summary: A growing need for smaller, energy-efficient, neural networks to be able to use machine learning applications on devices with limited computational resources. For Deep NNs, such procedures remain unsatisfactory as the resulting pruned networks can be difficult to train and, for instance, they do not prevent one layer from being fully pruned.
Score: 61.30574156442608
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Overparameterized Neural Networks (NN) display state-of-the-art performance. However, there is a growing need for smaller, energy-efficient, neural networks tobe able to use machine learning applications on devices with limited computational resources. A popular approach consists of using pruning techniques. While these techniques have traditionally focused on pruning pre-trained NN (LeCun et al.,1990; Hassibi et al., 1993), recent work by Lee et al. (2018) has shown promising results when pruning at initialization. However, for Deep NNs, such procedures remain unsatisfactory as the resulting pruned networks can be difficult to train and, for instance, they do not prevent one layer from being fully pruned. In this paper, we provide a comprehensive theoretical analysis of Magnitude and Gradient based pruning at initialization and training of sparse architectures. This allows us to propose novel principled approaches which we validate experimentally on a variety of NN architectures.

Related papers

Forward Only Learning for Orthogonal Neural Networks of any Depth [42.47993597828749]
Backpropagation is still the de facto algorithm used today to train neural networks.<n> PEPITA and forward-only frameworks have proposed promising alternatives, but they failed to scale up to a handful of hidden layers.<n>We introduce FOTON that bridges the gap with the backpropagation algorithm.
arXiv Detail & Related papers (2025-12-19T10:03:34Z)
Diluting Restricted Boltzmann Machines [0.0]
This paper investigates whether simpler, sparser networks can maintain strong performance under extreme pruning conditions.<n>Inspired by the Lottery Ticket Hypothesis, we demonstrate that RBMs can achieve high-quality generative performance even when up to 80% of the connections are pruned before training.<n>We identify a sharp transition above which the generative quality degrades abruptly when pruning disrupts a minimal core of essential connections.
arXiv Detail & Related papers (2025-11-01T18:03:58Z)
Structure-Preserving Network Compression Via Low-Rank Induced Training Through Linear Layers Composition [11.399520888150468]
We present a theoretically-justified technique termed Low-Rank Induced Training (LoRITa) LoRITa promotes low-rankness through the composition of linear layers and compresses by using singular value truncation. We demonstrate the effectiveness of our approach using MNIST on Fully Connected Networks, CIFAR10 on Vision Transformers, and CIFAR10/100 and ImageNet on Convolutional Neural Networks.
arXiv Detail & Related papers (2024-05-06T00:58:23Z)
The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples. In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z)
Theoretical Characterization of How Neural Network Pruning Affects its Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero. More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
Exact Gradient Computation for Spiking Neural Networks Through Forward Propagation [39.33537954568678]
Spiking neural networks (SNN) have emerged as alternatives to traditional neural networks. We propose a novel training algorithm, called emphforward propagation (FP), that computes exact gradients for SNN.
arXiv Detail & Related papers (2022-10-18T20:28:21Z)
Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network. We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint. Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z)
An Operator Theoretic Perspective on Pruning Deep Neural Networks [2.624902795082451]
We make use of recent advances in dynamical systems theory to define a new class of theoretically motivated pruning algorithms. We show that these algorithms can be equivalent to magnitude and gradient based pruning, unifying these seemingly disparate methods.
arXiv Detail & Related papers (2021-10-28T02:33:50Z)
Neural Network Pruning Through Constrained Reinforcement Learning [3.2880869992413246]
We propose a general methodology for pruning neural networks. Our proposed methodology can prune neural networks to respect pre-defined computational budgets. We prove the effectiveness of our approach via comparison with state-of-the-art methods on standard image classification datasets.
arXiv Detail & Related papers (2021-10-16T11:57:38Z)
A Framework for Neural Network Pruning Using Gibbs Distributions [34.0576955010317]
Gibbs pruning is a novel framework for expressing and designing neural network pruning methods. It can train and prune a network simultaneously in such a way that the learned weights and pruning mask are well-adapted for each other. We achieve a new state-of-the-art result for pruning ResNet-56 with the CIFAR-10 dataset.
arXiv Detail & Related papers (2020-06-08T23:04:53Z)
Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights. Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.