A Novel Architecture Slimming Method for Network Pruning and Knowledge
Distillation
- URL: http://arxiv.org/abs/2202.10461v1
- Date: Mon, 21 Feb 2022 12:45:51 GMT
- Title: A Novel Architecture Slimming Method for Network Pruning and Knowledge
Distillation
- Authors: Dongqi Wang and Shengyu Zhang and Zhipeng Di and Xin Lin and Weihua
Zhou and Fei Wu
- Abstract summary: We propose an architecture slimming method that automates the layer configuration process.
We show that our method shows significant performance gain over baselines after pruning and distillation.
Surprisingly, we find that the resulting layer-wise compression rates correspond to the layer sensitivities found by existing works.
- Score: 30.39128740788747
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Network pruning and knowledge distillation are two widely-known model
compression methods that efficiently reduce computation cost and model size. A
common problem in both pruning and distillation is to determine compressed
architecture, i.e., the exact number of filters per layer and layer
configuration, in order to preserve most of the original model capacity. In
spite of the great advances in existing works, the determination of an
excellent architecture still requires human interference or tremendous
experimentations. In this paper, we propose an architecture slimming method
that automates the layer configuration process. We start from the perspective
that the capacity of the over-parameterized model can be largely preserved by
finding the minimum number of filters preserving the maximum parameter variance
per layer, resulting in a thin architecture. We formulate the determination of
compressed architecture as a one-step orthogonal linear transformation, and
integrate principle component analysis (PCA), where the variances of filters in
the first several projections are maximized. We demonstrate the rationality of
our analysis and the effectiveness of the proposed method through extensive
experiments. In particular, we show that under the same overall compression
rate, the compressed architecture determined by our method shows significant
performance gain over baselines after pruning and distillation. Surprisingly,
we find that the resulting layer-wise compression rates correspond to the layer
sensitivities found by existing works through tremendous experimentations.
Related papers
- Language Models as Zero-shot Lossless Gradient Compressors: Towards
General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.
We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - Generalized Nested Latent Variable Models for Lossy Coding applied to Wind Turbine Scenarios [14.48369551534582]
A learning-based approach seeks to minimize the compromise between compression rate and reconstructed image quality.
A successful technique consists in introducing a deep hyperprior that operates within a 2-level nested latent variable model.
This paper extends this concept by designing a generalized L-level nested generative model with a Markov chain structure.
arXiv Detail & Related papers (2024-06-10T11:00:26Z) - Effective Layer Pruning Through Similarity Metric Perspective [0.0]
Deep neural networks have been the predominant paradigm in machine learning for solving cognitive tasks.
Pruning structures from these models is a straightforward approach to reducing network complexity.
Layer pruning often hurts the network predictive ability (i.e., accuracy) at high compression rates.
This work introduces an effective layer-pruning strategy that meets all underlying properties pursued by pruning methods.
arXiv Detail & Related papers (2024-05-27T11:54:51Z) - Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives.
We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis.
We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z) - Make Deep Networks Shallow Again [6.647569337929869]
A breakthrough has been achieved by the concept of residual connections.
A stack of residual connection layers can be expressed as an expansion of terms similar to the Taylor expansion.
In other words, a sequential deep architecture is substituted by a parallel shallow one.
arXiv Detail & Related papers (2023-09-15T14:18:21Z) - STD-NET: Search of Image Steganalytic Deep-learning Architecture via
Hierarchical Tensor Decomposition [40.997546601209145]
STD-NET is an unsupervised deep-learning architecture search approach via hierarchical tensor decomposition for image steganalysis.
Our proposed strategy is more efficient and can remove more redundancy compared with previous steganalytic network compression methods.
arXiv Detail & Related papers (2022-06-12T03:46:08Z) - Low-rank Tensor Decomposition for Compression of Convolutional Neural
Networks Using Funnel Regularization [1.8579693774597708]
We propose a model reduction method to compress the pre-trained networks using low-rank tensor decomposition.
A new regularization method, called funnel function, is proposed to suppress the unimportant factors during the compression.
For ResNet18 with ImageNet2012, our reduced model can reach more than twi times speed up in terms of GMAC with merely 0.7% Top-1 accuracy drop.
arXiv Detail & Related papers (2021-12-07T13:41:51Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - Learning End-to-End Lossy Image Compression: A Benchmark [90.35363142246806]
We first conduct a comprehensive literature survey of learned image compression methods.
We describe milestones in cutting-edge learned image-compression methods, review a broad range of existing works, and provide insights into their historical development routes.
By introducing a coarse-to-fine hyperprior model for entropy estimation and signal reconstruction, we achieve improved rate-distortion performance.
arXiv Detail & Related papers (2020-02-10T13:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.