Improving Network Slimming with Nonconvex Regularization
- URL: http://arxiv.org/abs/2010.01242v4
- Date: Wed, 18 Aug 2021 23:51:15 GMT
- Title: Improving Network Slimming with Nonconvex Regularization
- Authors: Kevin Bui, Fredrick Park, Shuai Zhang, Yingyong Qi, Jack Xin
- Abstract summary: Convolutional neural networks (CNNs) have developed to become powerful models for various computer vision tasks.
Most of the state-of-the-art CNNs cannot be deployed directly.
straightforward approach to compressing CNN is proposed.
- Score: 8.017631543721684
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Convolutional neural networks (CNNs) have developed to become powerful models
for various computer vision tasks ranging from object detection to semantic
segmentation. However, most of the state-of-the-art CNNs cannot be deployed
directly on edge devices such as smartphones and drones, which need low latency
under limited power and memory bandwidth. One popular, straightforward approach
to compressing CNNs is network slimming, which imposes $\ell_1$ regularization
on the channel-associated scaling factors via the batch normalization layers
during training. Network slimming thereby identifies insignificant channels
that can be pruned for inference. In this paper, we propose replacing the
$\ell_1$ penalty with an alternative nonconvex, sparsity-inducing penalty in
order to yield a more compressed and/or accurate CNN architecture. We
investigate $\ell_p (0 < p < 1)$, transformed $\ell_1$ (T$\ell_1$), minimax
concave penalty (MCP), and smoothly clipped absolute deviation (SCAD) due to
their recent successes and popularity in solving sparse optimization problems,
such as compressed sensing and variable selection. We demonstrate the
effectiveness of network slimming with nonconvex penalties on three neural
network architectures -- VGG-19, DenseNet-40, and ResNet-164 -- on standard
image classification datasets. Based on the numerical experiments, T$\ell_1$
preserves model accuracy against channel pruning, $\ell_{1/2, 3/4}$ yield
better compressed models with similar accuracies after retraining as $\ell_1$,
and MCP and SCAD provide more accurate models after retraining with similar
compression as $\ell_1$. Network slimming with T$\ell_1$ regularization also
outperforms the latest Bayesian modification of network slimming in compressing
a CNN architecture in terms of memory storage while preserving its model
accuracy after channel pruning.
Related papers
- Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module.
We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH)
In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z) - A Proximal Algorithm for Network Slimming [2.8148957592979427]
A popular channel pruning method for convolutional neural networks (CNNs) uses subgradient descent to train CNNs.
We develop an alternative algorithm called proximal NS to train CNNs towards sparse, accurate structures.
Our experiments demonstrate that after one round of training, proximal NS yields a CNN with competitive accuracy and compression.
arXiv Detail & Related papers (2023-07-02T23:34:12Z) - Attention-based Feature Compression for CNN Inference Offloading in Edge
Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems.
We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device.
Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z) - Improved techniques for deterministic l2 robustness [63.34032156196848]
Training convolutional neural networks (CNNs) with a strict 1-Lipschitz constraint under the $l_2$ norm is useful for adversarial robustness, interpretable gradients and stable training.
We introduce a procedure to certify robustness of 1-Lipschitz CNNs by replacing the last linear layer with a 1-hidden layer.
We significantly advance the state-of-the-art for standard and provable robust accuracies on CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2022-11-15T19:10:12Z) - CrAM: A Compression-Aware Minimizer [103.29159003723815]
We propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way.
CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning.
CrAM can produce sparse models which perform well for transfer learning, and it also works for semi-structured 2:4 pruning patterns supported by GPU hardware.
arXiv Detail & Related papers (2022-07-28T16:13:28Z) - Structured Pruning is All You Need for Pruning CNNs at Initialization [38.88730369884401]
Pruning is a popular technique for reducing the model size and computational cost of convolutional neural networks (CNNs)
We propose PreCropping, a structured hardware-efficient model compression scheme.
Compared to weight pruning, the proposed scheme is regular and dense in both storage and computation without sacrificing accuracy.
arXiv Detail & Related papers (2022-03-04T19:54:31Z) - Tied & Reduced RNN-T Decoder [0.0]
We study ways to make the RNN-T decoder (prediction network + joint network) smaller and faster without degradation in recognition performance.
Our prediction network performs a simple weighted averaging of the input embeddings, and shares its embedding matrix weights with the joint network's output layer.
This simple design, when used in conjunction with additional Edit-based Minimum Bayes Risk (EMBR) training, reduces the RNN-T Decoder from 23M parameters to just 2M, without affecting word-error rate (WER)
arXiv Detail & Related papers (2021-09-15T18:19:16Z) - Tensor Reordering for CNN Compression [7.228285747845778]
We show how parameter redundancy in Convolutional Neural Network (CNN) filters can be effectively reduced by pruning in spectral domain.
Our approach is applied to pretrained CNNs and we show that minor additional fine-tuning allows our method to recover the original model performance.
arXiv Detail & Related papers (2020-10-22T23:45:34Z) - Large Norms of CNN Layers Do Not Hurt Adversarial Robustness [11.930096161524407]
Lipschitz properties of convolutional neural networks (CNNs) are widely considered to be related to adversarial robustness.
We propose a novel regularization method termed norm decay, which can effectively reduce the norms of convolutional layers and fully-connected layers.
Experiments show that norm-regularization methods, including norm decay, weight decay, and singular value clipping, can improve generalization of CNNs.
arXiv Detail & Related papers (2020-09-17T17:33:50Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z) - Approximation and Non-parametric Estimation of ResNet-type Convolutional
Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes.
We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.