Related papers: Dirichlet Pruning for Neural Network Compression

Dirichlet Pruning for Neural Network Compression

URL: http://arxiv.org/abs/2011.05985v3
Date: Mon, 8 Mar 2021 23:37:45 GMT
Title: Dirichlet Pruning for Neural Network Compression
Authors: Kamil Adamczewski, Mijung Park
Abstract summary: We introduce Dirichlet pruning, a novel technique to transform a large neural network model into a compressed one. We perform extensive experiments on larger architectures such as VGG and ResNet. Our method achieves the state-of-the-art compression performance and provides interpretable features as a by-product.
Score: 10.77469946354744
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce Dirichlet pruning, a novel post-processing technique to transform a large neural network model into a compressed one. Dirichlet pruning is a form of structured pruning that assigns the Dirichlet distribution over each layer's channels in convolutional layers (or neurons in fully-connected layers) and estimates the parameters of the distribution over these units using variational inference. The learned distribution allows us to remove unimportant units, resulting in a compact architecture containing only crucial features for a task at hand. The number of newly introduced Dirichlet parameters is only linear in the number of channels, which allows for rapid training, requiring as little as one epoch to converge. We perform extensive experiments, in particular on larger architectures such as VGG and ResNet (45% and 58% compression rate, respectively) where our method achieves the state-of-the-art compression performance and provides interpretable features as a by-product.

Related papers

Group channel pruning and spatial attention distilling for object detection [2.8675002818821542]
We introduce a three-stage model compression method: dynamic sparse training, group channel pruning, and spatial attention distilling. Our method reduces the parameters of the model by 64.7 % and the calculation by 34.9%.
arXiv Detail & Related papers (2023-06-02T13:26:23Z)
Approximating Continuous Convolutions for Deep Network Compression [11.566258236184964]
We present ApproxConv, a novel method for compressing the layers of a convolutional neural network. We show that our method is able to compress existing deep network models by half whilst losing only 1.86% accuracy.
arXiv Detail & Related papers (2022-10-17T11:41:26Z)
OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization [32.60139548889592]
We propose a novel One-shot Pruning-Quantization (OPQ) in this paper. OPQ analytically solves the compression allocation with pre-trained weight parameters only. We propose a unified channel-wise quantization method that enforces all channels of each layer to share a common codebook.
arXiv Detail & Related papers (2022-05-23T09:05:25Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Group Fisher Pruning for Practical Network Compression [58.25776612812883]
We present a general channel pruning approach that can be applied to various complicated structures. We derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels. Our method can be used to prune any structures including those with coupled channels.
arXiv Detail & Related papers (2021-08-02T08:21:44Z)
Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices. Previous unstructured or structured weight pruning methods can hardly truly accelerate inference. We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
Dynamic Probabilistic Pruning: A general framework for hardware-constrained pruning at different granularities [80.06422693778141]
We propose a flexible new pruning mechanism that facilitates pruning at different granularities (weights, kernels, filters/feature maps) We refer to this algorithm as Dynamic Probabilistic Pruning (DPP) We show that DPP achieves competitive compression rates and classification accuracy when pruning common deep learning models trained on different benchmark datasets for image classification.
arXiv Detail & Related papers (2021-05-26T17:01:52Z)
Unfolding Neural Networks for Compressive Multichannel Blind Deconvolution [71.29848468762789]
We propose a learned-structured unfolding neural network for the problem of compressive sparse multichannel blind-deconvolution. In this problem, each channel's measurements are given as convolution of a common source signal and sparse filter. We demonstrate that our method is superior to classical structured compressive sparse multichannel blind-deconvolution methods in terms of accuracy and speed of sparse filter recovery.
arXiv Detail & Related papers (2020-10-22T02:34:33Z)
Channel Compression: Rethinking Information Redundancy among Channels in CNN Architecture [3.3018563701013988]
Research on efficient convolutional neural networks (CNNs) aims at removing feature redundancy by decomposing or optimizing the convolutional calculation. In this work, feature redundancy is assumed to exist among channels in CNN architectures, which provides some leeway to boost calculation efficiency. A novel convolutional construction named compact convolution is proposed to embrace the progress in spatial convolution, channel grouping and pooling operation.
arXiv Detail & Related papers (2020-07-02T10:58:54Z)
Embedding Propagation: Smoother Manifold for Few-Shot Classification [131.81692677836202]
We propose to use embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing in few-shot classification. We empirically show that embedding propagation yields a smoother embedding manifold. We show that embedding propagation consistently improves the accuracy of the models in multiple semi-supervised learning scenarios by up to 16% points.
arXiv Detail & Related papers (2020-03-09T13:51:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.