Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning
- URL: http://arxiv.org/abs/2006.12139v1
- Date: Mon, 22 Jun 2020 10:57:43 GMT
- Title: Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning
- Authors: Minyoung Song, Jaehong Yoon, Eunho Yang, Sung Ju Hwang
- Abstract summary: A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning.
We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset.
We validate STAMP against recent advanced pruning methods on benchmark datasets.
- Score: 83.59005356327103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As deep neural networks are growing in size and being increasingly deployed
to more resource-limited devices, there has been a recent surge of interest in
network pruning methods, which aim to remove less important weights or
activations of a given network. A common limitation of most existing pruning
techniques, is that they require pre-training of the network at least once
before pruning, and thus we can benefit from reduction in memory and
computation only at the inference time. However, reducing the training cost of
neural networks with rapid structural pruning may be beneficial either to
minimize monetary cost with cloud computing or to enable on-device learning on
a resource-limited device. Recently introduced random-weight pruning approaches
can eliminate the needs of pretraining, but they often obtain suboptimal
performance over conventional pruning techniques and also does not allow for
faster training since they perform unstructured pruning. To overcome their
limitations, we propose Set-based Task-Adaptive Meta Pruning (STAMP), which
task-adaptively prunes a network pretrained on a large reference dataset by
generating a pruning mask on it as a function of the target dataset. To ensure
maximum performance improvements on the target task, we meta-learn the mask
generator over different subsets of the reference dataset, such that it can
generalize well to any unseen datasets within a few gradient steps of training.
We validate STAMP against recent advanced pruning methods on benchmark
datasets, on which it not only obtains significantly improved compression rates
over the baselines at similar accuracy, but also orders of magnitude faster
training speed.
Related papers
- Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Learning a Consensus Sub-Network with Polarization Regularization and
One Pass Training [3.2214522506924093]
Pruning schemes create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph.
We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks.
Our results on CIFAR-10 and CIFAR-100 suggest that our scheme can remove 50% of connections in deep networks with less than 1% reduction in classification accuracy.
arXiv Detail & Related papers (2023-02-17T09:37:17Z) - Trainability Preserving Neural Structured Pruning [64.65659982877891]
We present trainability preserving pruning (TPP), a regularization-based structured pruning method that can effectively maintain trainability during sparsification.
TPP can compete with the ground-truth dynamical isometry recovery method on linear networks.
It delivers encouraging performance in comparison to many top-performing filter pruning methods.
arXiv Detail & Related papers (2022-07-25T21:15:47Z) - Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable.
In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols.
Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z) - Back to Basics: Efficient Network Compression via IMP [22.586474627159287]
Iterative Magnitude Pruning (IMP) is one of the most established approaches for network pruning.
IMP is often argued that it reaches suboptimal states by not incorporating sparsification into the training phase.
We find that IMP with SLR for retraining can outperform state-of-the-art pruning-during-training approaches.
arXiv Detail & Related papers (2021-11-01T11:23:44Z) - When to Prune? A Policy towards Early Structural Pruning [27.91996628143805]
We propose a policy that prunes as early as possible during training without hurting performance.
Our method yields $1.4%$ top-1 accuracy boost over state-of-the-art pruning counterparts, cuts down training cost on GPU by $2.4times$.
arXiv Detail & Related papers (2021-10-22T18:39:22Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Dense for the Price of Sparse: Improved Performance of Sparsely
Initialized Networks via a Subspace Offset [0.0]
We introduce a new DCT plus Sparse' layer architecture, which maintains information propagation and trainability even with as little as 0.01% trainable kernel parameters remaining.
Switching from standard sparse layers to DCT plus Sparse layers does not increase the storage footprint of a network and incurs only a small additional computational overhead.
arXiv Detail & Related papers (2021-02-12T00:05:02Z) - Weight Pruning via Adaptive Sparsity Loss [31.978830843036658]
Pruning neural networks has regained interest in recent years as a means to compress state-of-the-art deep neural networks.
We propose a robust learning framework that efficiently prunes network parameters during training with minimal computational overhead.
arXiv Detail & Related papers (2020-06-04T10:55:16Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.