Related papers: ECQ$^{\text{x}}$: Explainability-Driven Quantization for Low-Bit and Sparse DNNs

ECQ$^{\text{x}}$: Explainability-Driven Quantization for Low-Bit and Sparse DNNs

URL: http://arxiv.org/abs/2109.04236v1
Date: Thu, 9 Sep 2021 12:57:06 GMT
Title: ECQ$^{\text{x}}$: Explainability-Driven Quantization for Low-Bit and Sparse DNNs
Authors: Daniel Becking, Maximilian Dreyer, Wojciech Samek, Karsten M\"uller, Sebastian Lapuschkin
Abstract summary: We develop and describe a novel quantization paradigm for deep neural networks (DNNs) Our method leverages concepts of explainable AI (XAI) and concepts of information theory. The ultimate goal is to preserve the most relevant weights in quantization clusters of highest information content.
Score: 13.446502051609036
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The remarkable success of deep neural networks (DNNs) in various applications is accompanied by a significant increase in network parameters and arithmetic operations. Such increases in memory and computational demands make deep learning prohibitive for resource-constrained hardware platforms such as mobile devices. Recent efforts aim to reduce these overheads, while preserving model performance as much as possible, and include parameter reduction techniques, parameter quantization, and lossless compression techniques. In this chapter, we develop and describe a novel quantization paradigm for DNNs: Our method leverages concepts of explainable AI (XAI) and concepts of information theory: Instead of assigning weight values based on their distances to the quantization clusters, the assignment function additionally considers weight relevances obtained from Layer-wise Relevance Propagation (LRP) and the information content of the clusters (entropy optimization). The ultimate goal is to preserve the most relevant weights in quantization clusters of highest information content. Experimental results show that this novel Entropy-Constrained and XAI-adjusted Quantization (ECQ$^{\text{x}}$) method generates ultra low-precision (2-5 bit) and simultaneously sparse neural networks while maintaining or even improving model performance. Due to reduced parameter precision and high number of zero-elements, the rendered networks are highly compressible in terms of file size, up to $103\times$ compared to the full-precision unquantized DNN model. Our approach was evaluated on different types of models and datasets (including Google Speech Commands and CIFAR-10) and compared with previous work.

Related papers

Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight. We develop an improved KL metric to determine optimal quantization scales for activation. For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z)
Low Precision Quantization-aware Training in Spiking Neural Networks with Differentiable Quantization Function [0.5046831208137847]
This work aims to bridge the gap between recent progress in quantized neural networks and spiking neural networks. It presents an extensive study on the performance of the quantization function, represented as a linear combination of sigmoid functions. The presented quantization function demonstrates the state-of-the-art performance on four popular benchmarks.
arXiv Detail & Related papers (2023-05-30T09:42:05Z)
Low-bit Shift Network for End-to-End Spoken Language Understanding [7.851607739211987]
We propose the use of power-of-two quantization, which quantizes continuous parameters into low-bit power-of-two values. This reduces computational complexity by removing expensive multiplication operations and with the use of low-bit weights.
arXiv Detail & Related papers (2022-07-15T14:34:22Z)
BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation. Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration. This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z)
Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks [1.131071436917293]
Quantizing parameters and operations to lower bit-precision offers substantial memory and energy savings for neural network inference. This paper proposes a new quantization approach for mixed precision convolutional neural networks (CNNs) targeting edge-computing.
arXiv Detail & Related papers (2022-06-15T18:11:37Z)
A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification [0.0]
A promising approach is quantization, in which the full-precision values are stored in low bit-width precision. We present a comprehensive survey of quantization concepts and methods, with a focus on image classification. We explain the replacement of floating-point operations with low-cost bitwise operations in a quantized DNN and the sensitivity of different layers in quantization.
arXiv Detail & Related papers (2022-05-14T15:08:32Z)
ClusterQ: Semantic Feature Distribution Alignment for Data-Free Quantization [111.12063632743013]
We propose a new and effective data-free quantization method termed ClusterQ. To obtain high inter-class separability of semantic features, we cluster and align the feature distribution statistics. We also incorporate the intra-class variance to solve class-wise mode collapse.
arXiv Detail & Related papers (2022-04-30T06:58:56Z)
Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM) Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.