Trainability Preserving Neural Structured Pruning
- URL: http://arxiv.org/abs/2207.12534v1
- Date: Mon, 25 Jul 2022 21:15:47 GMT
- Title: Trainability Preserving Neural Structured Pruning
- Authors: Huan Wang and Yun Fu
- Abstract summary: We present trainability preserving pruning (TPP), a regularization-based structured pruning method that can effectively maintain trainability during sparsification.
TPP can compete with the ground-truth dynamical isometry recovery method on linear networks.
It delivers encouraging performance in comparison to many top-performing filter pruning methods.
- Score: 64.65659982877891
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several recent works empirically find finetuning learning rate is critical to
the final performance in neural network structured pruning. Further researches
find that the network trainability broken by pruning answers for it, thus
calling for an urgent need to recover trainability before finetuning. Existing
attempts propose to exploit weight orthogonalization to achieve dynamical
isometry for improved trainability. However, they only work for linear MLP
networks. How to develop a filter pruning method that maintains or recovers
trainability and is scalable to modern deep networks remains elusive. In this
paper, we present trainability preserving pruning (TPP), a regularization-based
structured pruning method that can effectively maintain trainability during
sparsification. Specifically, TPP regularizes the gram matrix of convolutional
kernels so as to de-correlate the pruned filters from the kept filters. Beside
the convolutional layers, we also propose to regularize the BN parameters for
better preserving trainability. Empirically, TPP can compete with the
ground-truth dynamical isometry recovery method on linear MLP networks. On
non-linear networks (ResNet56/VGG19, CIFAR datasets), it outperforms the other
counterpart solutions by a large margin. Moreover, TPP can also work
effectively with modern deep networks (ResNets) on ImageNet, delivering
encouraging performance in comparison to many top-performing filter pruning
methods. To our best knowledge, this is the first approach that effectively
maintains trainability during pruning for the large-scale deep neural networks.
Related papers
- Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Learning a Consensus Sub-Network with Polarization Regularization and
One Pass Training [3.2214522506924093]
Pruning schemes create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph.
We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks.
Our results on CIFAR-10 and CIFAR-100 suggest that our scheme can remove 50% of connections in deep networks with less than 1% reduction in classification accuracy.
arXiv Detail & Related papers (2023-02-17T09:37:17Z) - Boosting Pruned Networks with Linear Over-parameterization [8.796518772724955]
Structured pruning compresses neural networks by reducing channels (filters) for fast inference and low footprint at run-time.
To restore accuracy after pruning, fine-tuning is usually applied to pruned networks.
We propose a novel method that first linearly over- parameterizes the compact layers in pruned networks to enlarge the number of fine-tuning parameters.
arXiv Detail & Related papers (2022-04-25T05:30:26Z) - Interspace Pruning: Using Adaptive Filter Representations to Improve
Training of Sparse CNNs [69.3939291118954]
Unstructured pruning is well suited to reduce the memory footprint of convolutional neural networks (CNNs)
Standard unstructured pruning (SP) reduces the memory footprint of CNNs by setting filter elements to zero.
We introduce interspace pruning (IP), a general tool to improve existing pruning methods.
arXiv Detail & Related papers (2022-03-15T11:50:45Z) - Back to Basics: Efficient Network Compression via IMP [22.586474627159287]
Iterative Magnitude Pruning (IMP) is one of the most established approaches for network pruning.
IMP is often argued that it reaches suboptimal states by not incorporating sparsification into the training phase.
We find that IMP with SLR for retraining can outperform state-of-the-art pruning-during-training approaches.
arXiv Detail & Related papers (2021-11-01T11:23:44Z) - Edge Rewiring Goes Neural: Boosting Network Resilience via Policy
Gradient [62.660451283548724]
ResiNet is a reinforcement learning framework to discover resilient network topologies against various disasters and attacks.
We show that ResiNet achieves a near-optimal resilience gain on multiple graphs while balancing the utility, with a large margin compared to existing approaches.
arXiv Detail & Related papers (2021-10-18T06:14:28Z) - Feature Flow Regularization: Improving Structured Sparsity in Deep
Neural Networks [12.541769091896624]
Pruning is a model compression method that removes redundant parameters in deep neural networks (DNNs)
We propose a simple and effective regularization strategy from a new perspective of evolution of features, which we call feature flow regularization (FFR)
Experiments with VGGNets, ResNets on CIFAR-10/100, and Tiny ImageNet datasets demonstrate that FFR can significantly improve both unstructured and structured sparsity.
arXiv Detail & Related papers (2021-06-05T15:00:50Z) - Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning.
We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset.
We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z) - Pruning Filters while Training for Efficiently Optimizing Deep Learning
Networks [6.269700080380206]
Pruning techniques have been proposed that remove less significant weights in deep networks.
We propose a dynamic pruning-while-training procedure, wherein we prune filters of a deep network during training itself.
Results indicate that pruning while training yields a compressed network with almost no accuracy loss after pruning 50% of the filters.
arXiv Detail & Related papers (2020-03-05T18:05:17Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.