Ax-BxP: Approximate Blocked Computation for Precision-Reconfigurable
Deep Neural Network Acceleration
- URL: http://arxiv.org/abs/2011.13000v3
- Date: Thu, 28 Oct 2021 21:14:15 GMT
- Title: Ax-BxP: Approximate Blocked Computation for Precision-Reconfigurable
Deep Neural Network Acceleration
- Authors: Reena Elangovan, Shubham Jain, Anand Raghunathan
- Abstract summary: Precision scaling has emerged as a popular technique to optimize the compute and storage requirements of Deep Neural Networks (DNNs)
Efforts toward creating ultra-low-precision (sub-8-bit) DNNs suggest that the minimum precision required to achieve a given network-level accuracy varies considerably across networks.
Previous proposals such as bit-serial hardware incur high overheads, significantly diminishing the benefits of lower precision.
- Score: 3.7371886886933487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Precision scaling has emerged as a popular technique to optimize the compute
and storage requirements of Deep Neural Networks (DNNs). Efforts toward
creating ultra-low-precision (sub-8-bit) DNNs suggest that the minimum
precision required to achieve a given network-level accuracy varies
considerably across networks, and even across layers within a network,
requiring support for variable precision in DNN hardware. Previous proposals
such as bit-serial hardware incur high overheads, significantly diminishing the
benefits of lower precision. To efficiently support precision
re-configurability in DNN accelerators, we introduce an approximate computing
method wherein DNN computations are performed block-wise (a block is a group of
bits) and re-configurability is supported at the granularity of blocks. Results
of block-wise computations are composed in an approximate manner to enable
efficient re-configurability. We design a DNN accelerator that embodies
approximate blocked computation and propose a method to determine a suitable
approximation configuration for a given DNN. By varying the approximation
configurations across DNNs, we achieve 1.17x-1.73x and 1.02x-2.04x improvement
in system energy and performance respectively, over an 8-bit fixed-point (FxP8)
baseline, with negligible loss in classification accuracy. Further, by varying
the approximation configurations across layers and data-structures within DNNs,
we achieve 1.25x-2.42x and 1.07x-2.95x improvement in system energy and
performance respectively, with negligible accuracy loss.
Related papers
- Hardware-Software Co-optimised Fast and Accurate Deep Reconfigurable Spiking Inference Accelerator Architecture Design Methodology [2.968768532937366]
Spiking Neural Networks (SNNs) have emerged as a promising approach to improve the energy efficiency of machine learning models.
We develop a hardware-software co-optimisation strategy to port software-trained deep neural networks (DNN) to reduced-precision spiking models.
arXiv Detail & Related papers (2024-10-07T05:04:13Z) - SySMOL: Co-designing Algorithms and Hardware for Neural Networks with Heterogeneous Precisions [20.241671088121144]
Recent quantization techniques have enabled heterogeneous precisions at very fine granularity.
These networks require additional hardware to decode the precision settings for individual variables, align the variables, and provide fine-grained mixed-precision compute capabilities.
We present an end-to-end co-design approach to efficiently execute networks with fine-grained heterogeneous precisions.
arXiv Detail & Related papers (2023-11-23T17:20:09Z) - Binary Neural Networks as a general-propose compute paradigm for
on-device computer vision [0.0]
We propose a BNN framework comprising 1) a minimalistic inference scheme for hardware-friendliness, 2) an over- parameterized training scheme for high accuracy, and 3) a simple procedure to adapt to different vision tasks.
The resultant framework overtakes 8-bit quantization in the speed-vs-accuracy tradeoff for classification, detection, segmentation, super-resolution and matching.
Our BNNs promise 2.8-7$times$ fewer execution cycles than 8-bit and 2.1-2.7$times$ fewer cycles than alternative BNN designs.
arXiv Detail & Related papers (2022-02-08T08:38:22Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU
Tensor Cores [19.516279899089735]
We introduce the first Arbitrary Precision Neural Network framework (APNN-TC) to fully exploit quantization benefits on Ampere Cores.
APNN-TC supports arbitrary short bit-width computation with int1 compute primitives and XOR/AND operations.
It can achieve significant speedup overLAS CUTS kernels and various NN models, such as ResNet and VGG.
arXiv Detail & Related papers (2021-06-23T05:39:34Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - HAO: Hardware-aware neural Architecture Optimization for Efficient
Inference [25.265181492143107]
We develop an integer programming algorithm to prune the design space of a neural network search algorithm.
Our algorithm achieves 72.5% top-1 accuracy on ImageNet at framerate 50, which is 60% faster than MnasNet and 135% faster than FBNet with comparable accuracy.
arXiv Detail & Related papers (2021-04-26T17:59:29Z) - FATNN: Fast and Accurate Ternary Neural Networks [89.07796377047619]
Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts.
In this work, we show that, under some mild constraints, computational complexity of the ternary inner product can be reduced by a factor of 2.
We elaborately design an implementation-dependent ternary quantization algorithm to mitigate the performance gap.
arXiv Detail & Related papers (2020-08-12T04:26:18Z) - Distillation Guided Residual Learning for Binary Convolutional Neural
Networks [83.6169936912264]
It is challenging to bridge the performance gap between Binary CNN (BCNN) and Floating point CNN (FCNN)
We observe that, this performance gap leads to substantial residuals between intermediate feature maps of BCNN and FCNN.
To minimize the performance gap, we enforce BCNN to produce similar intermediate feature maps with the ones of FCNN.
This training strategy, i.e., optimizing each binary convolutional block with block-wise distillation loss derived from FCNN, leads to a more effective optimization to BCNN.
arXiv Detail & Related papers (2020-07-10T07:55:39Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.