Related papers: Basis Scaling and Double Pruning for Efficient Inference in Network-Based Transfer Learning

Basis Scaling and Double Pruning for Efficient Inference in Network-Based Transfer Learning

URL: http://arxiv.org/abs/2108.02893v2
Date: Wed, 20 Dec 2023 19:32:36 GMT
Title: Basis Scaling and Double Pruning for Efficient Inference in Network-Based Transfer Learning
Authors: Ken C. L. Wong, Satyananda Kashyap, Mehdi Moradi
Abstract summary: We decompose a convolutional layer into two layers: a convolutional layer with the orthonormal basis vectors as the filters, and a "BasisScalingConv" layer which is responsible for rescaling the features. We can achieve pruning ratios up to 74.6% for CIFAR-10 and 98.9% for MNIST in model parameters.
Score: 1.3467579878240454
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Network-based transfer learning allows the reuse of deep learning features with limited data, but the resulting models can be unnecessarily large. Although network pruning can improve inference efficiency, existing algorithms usually require fine-tuning that may not be suitable for small datasets. In this paper, using the singular value decomposition, we decompose a convolutional layer into two layers: a convolutional layer with the orthonormal basis vectors as the filters, and a "BasisScalingConv" layer which is responsible for rescaling the features and transforming them back to the original space. As the filters in each decomposed layer are linearly independent, when using the proposed basis scaling factors with the Taylor approximation of importance, pruning can be more effective and fine-tuning individual weights is unnecessary. Furthermore, as the numbers of input and output channels of the original convolutional layer remain unchanged after basis pruning, it is applicable to virtually all architectures and can be combined with existing pruning algorithms for double pruning to further increase the pruning capability. When transferring knowledge from ImageNet pre-trained models to different target domains, with less than 1% reduction in classification accuracies, we can achieve pruning ratios up to 74.6% for CIFAR-10 and 98.9% for MNIST in model parameters.

Related papers

RL-Pruner: Structured Pruning Using Reinforcement Learning for CNN Compression and Acceleration [0.0]
We propose RL-Pruner, which uses reinforcement learning to learn the optimal pruning distribution. RL-Pruner can automatically extract dependencies between filters in the input model and perform pruning, without requiring model-specific pruning implementations.
arXiv Detail & Related papers (2024-11-10T13:35:10Z)
SGLP: A Similarity Guided Fast Layer Partition Pruning for Compressing Large Deep Models [19.479746878680707]
Layer pruning is a potent approach to reduce network size and improve computational efficiency. We propose a Similarity Guided fast Layer Partition pruning for compressing large deep models. Our method outperforms the state-of-the-art methods in both accuracy and computational efficiency.
arXiv Detail & Related papers (2024-10-14T04:01:08Z)
Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models. It is most prominently used in federated learning. We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
Boosting Pruned Networks with Linear Over-parameterization [8.796518772724955]
Structured pruning compresses neural networks by reducing channels (filters) for fast inference and low footprint at run-time. To restore accuracy after pruning, fine-tuning is usually applied to pruned networks. We propose a novel method that first linearly over- parameterizes the compact layers in pruned networks to enlarge the number of fine-tuning parameters.
arXiv Detail & Related papers (2022-04-25T05:30:26Z)
End-to-End Sensitivity-Based Filter Pruning [49.61707925611295]
We present a sensitivity-based filter pruning algorithm (SbF-Pruner) to learn the importance scores of filters of each layer end-to-end. Our method learns the scores from the filter weights, enabling it to account for the correlations between the filters of each layer.
arXiv Detail & Related papers (2022-04-15T10:21:05Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Pruning Neural Networks with Interpolative Decompositions [5.377278489623063]
We introduce a principled approach to neural network pruning that casts the problem as a structured low-rank matrix approximation. We demonstrate how to prune a neural network by first building a set of primitives to prune a single fully connected or convolution layer. We achieve an accuracy of 93.62 $pm$ 0.36% using VGG-16 on CIFAR-10, with a 51% FLOPS reduction.
arXiv Detail & Related papers (2021-07-30T20:13:49Z)
Layer Pruning via Fusible Residual Convolutional Block for Deep Neural Networks [15.64167076052513]
layer pruning has less inference time and runtime memory usage when the same FLOPs and number of parameters are pruned. We propose a simple layer pruning method using residual convolutional block (ResConv) Our pruning method achieves excellent performance of compression and acceleration over the state-thearts on different datasets.
arXiv Detail & Related papers (2020-11-29T12:51:16Z)
Layer-adaptive sparsity for the Magnitude-based Pruning [88.37510230946478]
We propose a novel importance score for global pruning, coined layer-adaptive magnitude-based pruning (LAMP) score. LAMP consistently outperforms popular existing schemes for layerwise sparsity selection.
arXiv Detail & Related papers (2020-10-15T09:14:02Z)
Dual-constrained Deep Semi-Supervised Coupled Factorization Network with Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net. To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network. Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.