Subtractor-Based CNN Inference Accelerator
- URL: http://arxiv.org/abs/2310.01022v1
- Date: Mon, 2 Oct 2023 09:15:58 GMT
- Title: Subtractor-Based CNN Inference Accelerator
- Authors: Victor Gao, Issam Hammad, Kamal El-Sankary, and Jason Gu
- Abstract summary: This paper presents a novel method to boost the performance of CNN inference accelerators by utilizing subtractors.
With a rounding size of 0.05, the proposed design can achieve 32.03% power savings and a 24.59% reduction in area at the cost of only 0.1% in terms of accuracy loss.
- Score: 3.663763133721262
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents a novel method to boost the performance of CNN inference
accelerators by utilizing subtractors. The proposed CNN preprocessing
accelerator relies on sorting, grouping, and rounding the weights to create
combinations that allow for the replacement of one multiplication operation and
addition operation by a single subtraction operation when applying convolution
during inference. Given the high cost of multiplication in terms of power and
area, replacing it with subtraction allows for a performance boost by reducing
power and area. The proposed method allows for controlling the trade-off
between performance gains and accuracy loss through increasing or decreasing
the usage of subtractors. With a rounding size of 0.05 and by utilizing LeNet-5
with the MNIST dataset, the proposed design can achieve 32.03% power savings
and a 24.59% reduction in area at the cost of only 0.1% in terms of accuracy
loss.
Related papers
- ReduceFormer: Attention with Tensor Reduction by Summation [4.985969607297595]
We introduce ReduceFormer, a family of models optimized for efficiency with the spirit of attention.
ReduceFormer leverages only simple operations such as reduction and element-wise multiplication, leading to greatly simplified architecture and improved inference performance.
The proposed model family is suitable for edge devices where compute resource and memory bandwidth are limited, as well as for cloud computing where high throughput is sought after.
arXiv Detail & Related papers (2024-06-11T17:28:09Z) - LATTE: Low-Precision Approximate Attention with Head-wise Trainable Threshold for Efficient Transformer [0.0]
We propose Low-Precision Approximate Attention with Head-wise Trainable Threshold for Efficient Transformer (LATTE)
LATTE employs a headwise threshold-based filter with the low-precision dot product to reduce the computation of Multi-Head Attention (MHA)
Experimental results indicate LATTE can smoothly adapt to both NLP and CV tasks, offering significant computation savings.
arXiv Detail & Related papers (2024-04-11T07:23:19Z) - Point Transformer V3: Simpler, Faster, Stronger [88.80496333515325]
This paper focuses on overcoming the existing trade-offs between accuracy and efficiency within the context of point cloud processing.
We present Point Transformer V3 (PTv3), which prioritizes simplicity and efficiency over the accuracy of certain mechanisms.
PTv3 attains state-of-the-art results on over 20 downstream tasks that span both indoor and outdoor scenarios.
arXiv Detail & Related papers (2023-12-15T18:59:59Z) - Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module.
We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH)
In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z) - Attention-based Feature Compression for CNN Inference Offloading in Edge
Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems.
We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device.
Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z) - Fast Exploration of the Impact of Precision Reduction on Spiking Neural
Networks [63.614519238823206]
Spiking Neural Networks (SNNs) are a practical choice when the target hardware reaches the edge of computing.
We employ an Interval Arithmetic (IA) model to develop an exploration methodology that takes advantage of the capability of such a model to propagate the approximation error.
arXiv Detail & Related papers (2022-11-22T15:08:05Z) - Compression-aware Projection with Greedy Dimension Reduction for
Convolutional Neural Network Activations [3.6188659868203388]
We propose a compression-aware projection system to improve the trade-off between classification accuracy and compression ratio.
Our test results show that the proposed methods effectively reduce 2.91x5.97x memory access with negligible accuracy drop on MobileNetV2/ResNet18/VGG16.
arXiv Detail & Related papers (2021-10-17T14:02:02Z) - MicroNet: Improving Image Recognition with Extremely Low FLOPs [82.54764264255505]
We find two factors, sparse connectivity and dynamic activation function, are effective to improve the accuracy.
We present a new dynamic activation function, named Dynamic Shift Max, to improve the non-linearity.
We arrive at a family of networks, named MicroNet, that achieves significant performance gains over the state of the art in the low FLOP regime.
arXiv Detail & Related papers (2021-08-12T17:59:41Z) - Control Variate Approximation for DNN Accelerators [3.1921317895626493]
We introduce a control variate approximation technique for low error approximate Deep Neural Network (DNN) accelerators.
Our approach significantly decreases the induced error due to approximate multiplications in inference, without requiring time-exhaustive retraining.
arXiv Detail & Related papers (2021-02-18T22:11:58Z) - Inception Convolution with Efficient Dilation Search [121.41030859447487]
Dilation convolution is a critical mutant of standard convolution neural network to control effective receptive fields and handle large scale variance of objects.
We propose a new mutant of dilated convolution, namely inception (dilated) convolution where the convolutions have independent dilation among different axes, channels and layers.
We explore a practical method for fitting the complex inception convolution to the data, a simple while effective dilation search algorithm(EDO) based on statistical optimization is developed.
arXiv Detail & Related papers (2020-12-25T14:58:35Z) - Explicitly Trained Spiking Sparsity in Spiking Neural Networks with
Backpropagation [7.952659059689134]
Spiking Neural Networks (SNNs) are being explored for their potential energy efficiency resulting from sparse, event-driven computations.
We propose an explicit inclusion of spike counts in the loss function, along with a traditional error loss, to optimize weight parameters for both accuracy and spiking sparsity.
arXiv Detail & Related papers (2020-03-02T23:39:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.