Related papers: Lite it fly: An All-Deformable-Butterfly Network

Lite it fly: An All-Deformable-Butterfly Network

URL: http://arxiv.org/abs/2311.08125v1
Date: Tue, 14 Nov 2023 12:41:22 GMT
Title: Lite it fly: An All-Deformable-Butterfly Network
Authors: Rui Lin, Jason Chun Lok Li, Jiajun Zhou, Binxiao Huang, Jie Ran and Ngai Wong
Abstract summary: Most deep neural networks (DNNs) consist fundamentally of convolutional and/or fully connected layers. The lately proposed deformable butterfly (DeBut) decomposes the filter matrix into generalized, butterflylike factors. This work reveals an intimate link between DeBut and a systematic hierarchy of depthwise and pointwise convolutions.
Score: 7.8460795568982435
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most deep neural networks (DNNs) consist fundamentally of convolutional and/or fully connected layers, wherein the linear transform can be cast as the product between a filter matrix and a data matrix obtained by arranging feature tensors into columns. The lately proposed deformable butterfly (DeBut) decomposes the filter matrix into generalized, butterflylike factors, thus achieving network compression orthogonal to the traditional ways of pruning or low-rank decomposition. This work reveals an intimate link between DeBut and a systematic hierarchy of depthwise and pointwise convolutions, which explains the empirically good performance of DeBut layers. By developing an automated DeBut chain generator, we show for the first time the viability of homogenizing a DNN into all DeBut layers, thus achieving an extreme sparsity and compression. Various examples and hardware benchmarks verify the advantages of All-DeBut networks. In particular, we show it is possible to compress a PointNet to < 5% parameters with < 5% accuracy drop, a record not achievable by other compression schemes.

Related papers

Layer-Specific Optimization: Sensitivity Based Convolution Layers Basis Search [0.0]
We propose a new way of applying the matrix decomposition with respect to the weights of convolutional layers. The essence of the method is to train not all convolutions, but only the subset of convolutions (basis convolutions) and represent the rest as linear combinations of the basis ones. Experiments on models from the ResNet family and the CIFAR-10 dataset demonstrate that basis convolutions can not only reduce the size of the model but also accelerate the forward and backward passes of the network.
arXiv Detail & Related papers (2024-08-12T09:24:48Z)
ButterflyFlow: Building Invertible Layers with Butterfly Matrices [80.83142511616262]
We propose a new family of invertible linear layers based on butterfly layers. Based on our invertible butterfly layers, we construct a new class of normalizing flow models called ButterflyFlow.
arXiv Detail & Related papers (2022-09-28T01:58:18Z)
Dynamic Probabilistic Pruning: A general framework for hardware-constrained pruning at different granularities [80.06422693778141]
We propose a flexible new pruning mechanism that facilitates pruning at different granularities (weights, kernels, filters/feature maps) We refer to this algorithm as Dynamic Probabilistic Pruning (DPP) We show that DPP achieves competitive compression rates and classification accuracy when pruning common deep learning models trained on different benchmark datasets for image classification.
arXiv Detail & Related papers (2021-05-26T17:01:52Z)
ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction [32.489371527159236]
This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation. We show that for high-dimensional multi-class data, the optimal linear discriminative representation maximizes the coding rate difference between the whole dataset and the average of all the subsets. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction objective naturally leads to a multi-layer deep network, named ReduNet, that shares common characteristics of modern deep networks.
arXiv Detail & Related papers (2021-05-21T16:29:57Z)
A Deeper Look into Convolutions via Pruning [9.89901717499058]
Modern architectures contain a very small number of fully-connected layers, often at the end, after multiple layers of convolutions. Although this strategy already reduces the number of parameters, most of the convolutions can be eliminated as well, without suffering any loss in recognition performance. In this work, we use the matrix characteristics based on eigenvalues in addition to the classical weight-based importance assignment approach for pruning to shed light on the internal mechanisms of a widely used family of CNNs.
arXiv Detail & Related papers (2021-02-04T18:55:03Z)
Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together. In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function. We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z)
Unfolding Neural Networks for Compressive Multichannel Blind Deconvolution [71.29848468762789]
We propose a learned-structured unfolding neural network for the problem of compressive sparse multichannel blind-deconvolution. In this problem, each channel's measurements are given as convolution of a common source signal and sparse filter. We demonstrate that our method is superior to classical structured compressive sparse multichannel blind-deconvolution methods in terms of accuracy and speed of sparse filter recovery.
arXiv Detail & Related papers (2020-10-22T02:34:33Z)
Sparse Linear Networks with a Fixed Butterfly Structure: Theory and Practice [4.3400407844814985]
We propose to replace a dense linear layer in any neural network by an architecture based on the butterfly network. In a collection of experiments, including supervised prediction on both the NLP and vision data, we show that this not only produces results that match and at times outperform existing well-known architectures.
arXiv Detail & Related papers (2020-07-17T09:45:03Z)
DHP: Differentiable Meta Pruning via HyperNetworks [158.69345612783198]
This paper introduces a differentiable pruning method via hypernetworks for automatic network pruning. Latent vectors control the output channels of the convolutional layers in the backbone network and act as a handle for the pruning of the layers. Experiments are conducted on various networks for image classification, single image super-resolution, and denoising.
arXiv Detail & Related papers (2020-03-30T17:59:18Z)
Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression [145.04742985050808]
We analyze two popular network compression techniques, i.e. filter pruning and low-rank decomposition, in a unified sense. By changing the way the sparsity regularization is enforced, filter pruning and low-rank decomposition can be derived accordingly. Our approach proves its potential as it compares favorably to the state-of-the-art on several benchmarks.
arXiv Detail & Related papers (2020-03-19T17:57:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.