MST-compression: Compressing and Accelerating Binary Neural Networks
with Minimum Spanning Tree
- URL: http://arxiv.org/abs/2308.13735v1
- Date: Sat, 26 Aug 2023 02:42:12 GMT
- Title: MST-compression: Compressing and Accelerating Binary Neural Networks
with Minimum Spanning Tree
- Authors: Quang Hieu Vo, Linh-Tam Tran, Sung-Ho Bae, Lok-Won Kim and Choong Seon
Hong
- Abstract summary: Binary neural networks (BNNs) have been widely adopted to reduce the computational cost and memory storage on edge-computing devices.
However, as neural networks become wider/deeper to improve accuracy and meet practical requirements, the computational burden remains a significant challenge even on the binary version.
This paper proposes a novel method called Minimum Spanning Tree (MST) compression that learns to compress and accelerate BNNs.
- Score: 21.15961593182111
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Binary neural networks (BNNs) have been widely adopted to reduce the
computational cost and memory storage on edge-computing devices by using
one-bit representation for activations and weights. However, as neural networks
become wider/deeper to improve accuracy and meet practical requirements, the
computational burden remains a significant challenge even on the binary
version. To address these issues, this paper proposes a novel method called
Minimum Spanning Tree (MST) compression that learns to compress and accelerate
BNNs. The proposed architecture leverages an observation from previous works
that an output channel in a binary convolution can be computed using another
output channel and XNOR operations with weights that differ from the weights of
the reused channel. We first construct a fully connected graph with vertices
corresponding to output channels, where the distance between two vertices is
the number of different values between the weight sets used for these outputs.
Then, the MST of the graph with the minimum depth is proposed to reorder output
calculations, aiming to reduce computational cost and latency. Moreover, we
propose a new learning algorithm to reduce the total MST distance during
training. Experimental results on benchmark models demonstrate that our method
achieves significant compression ratios with negligible accuracy drops, making
it a promising approach for resource-constrained edge-computing devices.
Related papers
- Quality Scalable Quantization Methodology for Deep Learning on Edge [0.20718016474717196]
Deep Learning Architectures employ heavy computations and bulk of the computational energy is taken up by the convolution operations in the Convolutional Neural Networks.
The proposed work is to reduce the energy consumption and size of CNN for using machine learning techniques in edge computing on ubiquitous computing devices.
The experiments done on LeNet and ConvNets show an increase upto 6% of zeros and memory savings upto 82.4919% while keeping the accuracy near the state of the art.
arXiv Detail & Related papers (2024-07-15T22:00:29Z) - A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate
Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads.
We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off.
Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z) - Low-bit Shift Network for End-to-End Spoken Language Understanding [7.851607739211987]
We propose the use of power-of-two quantization, which quantizes continuous parameters into low-bit power-of-two values.
This reduces computational complexity by removing expensive multiplication operations and with the use of low-bit weights.
arXiv Detail & Related papers (2022-07-15T14:34:22Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - Instant Neural Graphics Primitives with a Multiresolution Hash Encoding [67.33850633281803]
We present a versatile new input encoding that permits the use of a smaller network without sacrificing quality.
A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through a gradient descent.
We achieve a combined speed of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds.
arXiv Detail & Related papers (2022-01-16T07:22:47Z) - Tied & Reduced RNN-T Decoder [0.0]
We study ways to make the RNN-T decoder (prediction network + joint network) smaller and faster without degradation in recognition performance.
Our prediction network performs a simple weighted averaging of the input embeddings, and shares its embedding matrix weights with the joint network's output layer.
This simple design, when used in conjunction with additional Edit-based Minimum Bayes Risk (EMBR) training, reduces the RNN-T Decoder from 23M parameters to just 2M, without affecting word-error rate (WER)
arXiv Detail & Related papers (2021-09-15T18:19:16Z) - Spike time displacement based error backpropagation in convolutional
spiking neural networks [0.6193838300896449]
In this paper, we extend the STiDi-BP algorithm to employ it in deeper and convolutional architectures.
The evaluation results on the image classification task based on two popular benchmarks, MNIST and Fashion-MNIST, confirm that this algorithm has been applicable in deep SNNs.
We consider a convolutional SNN with two sets of weights: real-valued weights that are updated in the backward pass and their signs, binary weights, that are employed in the feedforward process.
arXiv Detail & Related papers (2021-08-31T05:18:59Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.