Quantization of Deep Neural Networks for Accumulator-constrained
Processors
- URL: http://arxiv.org/abs/2004.11783v1
- Date: Fri, 24 Apr 2020 14:47:14 GMT
- Title: Quantization of Deep Neural Networks for Accumulator-constrained
Processors
- Authors: Barry de Bruin, Zoran Zivkovic, Henk Corporaal
- Abstract summary: We introduce an Artificial Neural Network (ANN) quantization methodology for platforms without wide accumulation registers.
We formulate the quantization problem as a function of accumulator size, and aim to maximize the model accuracy by maximizing bit width of input data and weights.
We demonstrate that 16-bit accumulators are able to obtain a classification accuracy within 1% of the floating-point baselines on the CIFAR-10 and ILSVRC2012 image classification benchmarks.
- Score: 2.8489574654566674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce an Artificial Neural Network (ANN) quantization methodology for
platforms without wide accumulation registers. This enables fixed-point model
deployment on embedded compute platforms that are not specifically designed for
large kernel computations (i.e. accumulator-constrained processors). We
formulate the quantization problem as a function of accumulator size, and aim
to maximize the model accuracy by maximizing bit width of input data and
weights. To reduce the number of configurations to consider, only solutions
that fully utilize the available accumulator bits are being tested. We
demonstrate that 16-bit accumulators are able to obtain a classification
accuracy within 1\% of the floating-point baselines on the CIFAR-10 and
ILSVRC2012 image classification benchmarks. Additionally, a near-optimal
$2\times$ speedup is obtained on an ARM processor, by exploiting 16-bit
accumulators for image classification on the All-CNN-C and AlexNet networks.
Related papers
- A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance [49.1574468325115]
accumulator-aware quantization (A2Q) is a novel weight quantization method designed to train quantized neural networks (QNNs) to avoid overflow during inference.
A2Q introduces a unique formulation inspired by weight normalization that constrains the L1-norm of model weights according to accumulator bit width bounds.
We show A2Q can train QNNs for low-precision accumulators while maintaining model accuracy competitive with a floating-point baseline.
arXiv Detail & Related papers (2023-08-25T17:28:58Z) - DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures
using Lookup Tables [49.965024476651706]
DeepGEMM is a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware.
Our implementation outperforms corresponding 8-bit integer kernels by up to 1.74x on x86 platforms.
arXiv Detail & Related papers (2023-04-18T15:13:10Z) - Quantized Neural Networks for Low-Precision Accumulation with Guaranteed
Overflow Avoidance [68.8204255655161]
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference.
We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline.
arXiv Detail & Related papers (2023-01-31T02:46:57Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - On the quantization of recurrent neural networks [9.549757800469196]
quantization of neural networks can be defined as the approximation of the high precision computation of the canonical neural network formulation.
We present an integer-only quantization strategy for Long Short-Term Memory (LSTM) neural network topologies.
arXiv Detail & Related papers (2021-01-14T04:25:08Z) - Binarization Methods for Motor-Imagery Brain-Computer Interface
Classification [18.722731794073756]
We propose methods for transforming real-valued weights to binary numbers for efficient inference.
By tuning the dimension of the binary embedding, we achieve almost the same accuracy in 4-class MI ($leq$1.27% lower) compared to models with float16 weights.
Our method replaces the fully connected layer of CNNs with a binary augmented memory using bipolar random projection.
arXiv Detail & Related papers (2020-10-14T12:28:18Z) - Leveraging Automated Mixed-Low-Precision Quantization for tiny edge
microcontrollers [76.30674794049293]
This paper presents an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices.
Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors.
Given an MCU-class memory bound to 2MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions.
arXiv Detail & Related papers (2020-08-12T06:09:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.