Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression
- URL: http://arxiv.org/abs/2106.08301v2
- Date: Wed, 16 Jun 2021 16:43:08 GMT
- Title: Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression
- Authors: Sheng Lin, Wei Jiang, Wei Wang, Kaidi Xu, Yanzhi Wang, Shan Liu and
Songnan Li
- Abstract summary: Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
- Score: 56.83861738731913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compressing Deep Neural Network (DNN) models to alleviate the storage and
computation requirements is essential for practical applications, especially
for resource limited devices. Although capable of reducing a reasonable amount
of model parameters, previous unstructured or structured weight pruning methods
can hardly truly accelerate inference, either due to the poor hardware
compatibility of the unstructured sparsity or due to the low sparse rate of the
structurally pruned network. Aiming at reducing both storage and computation,
as well as preserving the original task performance, we propose a generalized
weight unification framework at a hardware compatible micro-structured level to
achieve high amount of compression and acceleration. Weight coefficients of a
selected micro-structured block are unified to reduce the storage and
computation of the block without changing the neuron connections, which turns
to a micro-structured pruning special case when all unified coefficients are
set to zero, where neuron connections (hence storage and computation) are
completely removed. In addition, we developed an effective training framework
based on the alternating direction method of multipliers (ADMM), which converts
our complex constrained optimization into separately solvable subproblems.
Through iteratively optimizing the subproblems, the desired micro-structure can
be ensured with high compression ratio and low performance degradation. We
extensively evaluated our method using a variety of benchmark models and
datasets for different applications. Experimental results demonstrate
state-of-the-art performance.
Related papers
- Adaptive Error-Bounded Hierarchical Matrices for Efficient Neural Network Compression [0.0]
This paper introduces a dynamic, error-bounded hierarchical matrix (H-matrix) compression method tailored for Physics-Informed Neural Networks (PINNs)
The proposed approach reduces the computational complexity and memory demands of large-scale physics-based models while preserving the essential properties of the Neural Tangent Kernel (NTK)
Empirical results demonstrate that this technique outperforms traditional compression methods, such as Singular Value Decomposition (SVD), pruning, and quantization, by maintaining high accuracy and improving generalization capabilities.
arXiv Detail & Related papers (2024-09-11T05:55:51Z) - Resource Management for Low-latency Cooperative Fine-tuning of Foundation Models at the Network Edge [35.40849522296486]
Large-scale foundation models (FoMos) can perform human-like intelligence.
FoMos need to be adapted to specialized downstream tasks through fine-tuning techniques.
We advocate multi-device cooperation within the device-edge cooperative fine-tuning paradigm.
arXiv Detail & Related papers (2024-07-13T12:47:14Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Learning k-Level Structured Sparse Neural Networks Using Group Envelope Regularization [4.0554893636822]
We introduce a novel approach to deploy large-scale Deep Neural Networks on constrained resources.
The method speeds up inference time and aims to reduce memory demand and power consumption.
arXiv Detail & Related papers (2022-12-25T15:40:05Z) - STN: Scalable Tensorizing Networks via Structure-Aware Training and
Adaptive Compression [10.067082377396586]
We propose Scalableizing Networks (STN), which adaptively adjust the model size and decomposition structure without retraining.
STN is compatible with arbitrary network architectures and achieves higher compression performance and flexibility over other tensorizing versions.
arXiv Detail & Related papers (2022-05-30T15:50:48Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Dynamic Probabilistic Pruning: A general framework for
hardware-constrained pruning at different granularities [80.06422693778141]
We propose a flexible new pruning mechanism that facilitates pruning at different granularities (weights, kernels, filters/feature maps)
We refer to this algorithm as Dynamic Probabilistic Pruning (DPP)
We show that DPP achieves competitive compression rates and classification accuracy when pruning common deep learning models trained on different benchmark datasets for image classification.
arXiv Detail & Related papers (2021-05-26T17:01:52Z) - Compact CNN Structure Learning by Knowledge Distillation [34.36242082055978]
We propose a framework that leverages knowledge distillation along with customizable block-wise optimization to learn a lightweight CNN structure.
Our method results in a state of the art network compression while being capable of achieving better inference accuracy.
In particular, for the already compact network MobileNet_v2, our method offers up to 2x and 5.2x better model compression.
arXiv Detail & Related papers (2021-04-19T10:34:22Z) - Structured Convolutions for Efficient Neural Network Design [65.36569572213027]
We tackle model efficiency by exploiting redundancy in the textitimplicit structure of the building blocks of convolutional neural networks.
We show how this decomposition can be applied to 2D and 3D kernels as well as the fully-connected layers.
arXiv Detail & Related papers (2020-08-06T04:38:38Z) - Structured Sparsification with Joint Optimization of Group Convolution
and Channel Shuffle [117.95823660228537]
We propose a novel structured sparsification method for efficient network compression.
The proposed method automatically induces structured sparsity on the convolutional weights.
We also address the problem of inter-group communication with a learnable channel shuffle mechanism.
arXiv Detail & Related papers (2020-02-19T12:03:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.