Related papers: Control Variate Approximation for DNN Accelerators

Control Variate Approximation for DNN Accelerators

URL: http://arxiv.org/abs/2102.09642v1
Date: Thu, 18 Feb 2021 22:11:58 GMT
Title: Control Variate Approximation for DNN Accelerators
Authors: Georgios Zervakis, Ourania Spantidi, Iraklis Anagnostopoulos, Hussam Amrouch, J\"org Henkel
Abstract summary: We introduce a control variate approximation technique for low error approximate Deep Neural Network (DNN) accelerators. Our approach significantly decreases the induced error due to approximate multiplications in inference, without requiring time-exhaustive retraining.
Score: 3.1921317895626493
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we introduce a control variate approximation technique for low error approximate Deep Neural Network (DNN) accelerators. The control variate technique is used in Monte Carlo methods to achieve variance reduction. Our approach significantly decreases the induced error due to approximate multiplications in DNN inference, without requiring time-exhaustive retraining compared to state-of-the-art. Leveraging our control variate method, we use highly approximated multipliers to generate power-optimized DNN accelerators. Our experimental evaluation on six DNNs, for Cifar-10 and Cifar-100 datasets, demonstrates that, compared to the accurate design, our control variate approximation achieves same performance and 24% power reduction for a merely 0.16% accuracy loss.

Related papers

Efficient ANN-SNN Conversion with Error Compensation Learning [20.155985131466174]
Spiking neural networks (SNNs) operate through discrete spike events and offer superior energy efficiency.<n>Current ANN-to-SNN conversion often results in significant accuracy loss and increased inference time due to conversion errors.<n>This paper proposes a novel ANN-to-SNN conversion framework based on error compensation learning.
arXiv Detail & Related papers (2025-05-12T15:31:34Z)
Leveraging Highly Approximated Multipliers in DNN Inference [13.973803328588687]
Our approach does not require retraining and significantly decreases the induced error due to approximate multiplications. Compared to the corresponding approximate designs without using our technique, our approach improves the accuracy by 1.9x on average.
arXiv Detail & Related papers (2024-12-21T20:09:29Z)
Cal-DETR: Calibrated Detection Transformer [67.75361289429013]
We propose a mechanism for calibrated detection transformers (Cal-DETR), particularly for Deformable-DETR, UP-DETR and DINO. We develop an uncertainty-guided logit modulation mechanism that leverages the uncertainty to modulate the class logits. Results corroborate the effectiveness of Cal-DETR against the competing train-time methods in calibrating both in-domain and out-domain detections.
arXiv Detail & Related papers (2023-11-06T22:13:10Z)
Improving Realistic Worst-Case Performance of NVCiM DNN Accelerators through Training with Right-Censored Gaussian Noise [16.470952550714394]
We propose to use the k-th percentile performance (KPP) to capture the realistic worst-case performance of DNN models executing on CiM accelerators. Our method achieves up to a 26% improvement in KPP compared to the state-of-the-art methods employed to enhance robustness under the impact of device variations.
arXiv Detail & Related papers (2023-07-29T01:06:37Z)
Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance [68.8204255655161]
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference. We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline.
arXiv Detail & Related papers (2023-01-31T02:46:57Z)
Fast Exploration of the Impact of Precision Reduction on Spiking Neural Networks [63.614519238823206]
Spiking Neural Networks (SNNs) are a practical choice when the target hardware reaches the edge of computing. We employ an Interval Arithmetic (IA) model to develop an exploration methodology that takes advantage of the capability of such a model to propagate the approximation error.
arXiv Detail & Related papers (2022-11-22T15:08:05Z)
Towards Lossless ANN-SNN Conversion under Ultra-Low Latency with Dual-Phase Optimization [30.098268054714048]
Spiking neural networks (SNNs) operating with asynchronous discrete events show higher energy efficiency with sparse computation. A popular approach for implementing deep SNNs is ANN-SNN conversion combining both efficient training of ANNs and efficient inference of SNNs. In this paper, we first identify that such performance degradation stems from the misrepresentation of the negative or overflow residual membrane potential in SNNs. Inspired by this, we decompose the conversion error into three parts: quantization error, clipping error, and residual membrane potential representation error.
arXiv Detail & Related papers (2022-05-16T06:53:14Z)
Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations. We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z)
Positive/Negative Approximate Multipliers for DNN Accelerators [3.1921317895626493]
We present a filter-oriented approximation method to map the weights to the appropriate modes of the approximate multiplier. Our approach achieves 18.33% energy gains on average across 7 NNs on 4 different datasets for a maximum accuracy drop of only 1%.
arXiv Detail & Related papers (2021-07-20T09:36:24Z)
Random and Adversarial Bit Error Robustness: Energy-Efficient and Secure DNN Accelerators [105.60654479548356]
We show that a combination of robust fixed-point quantization, weight clipping, as well as random bit error training (RandBET) improves robustness against random or adversarial bit errors in quantized DNN weights significantly. This leads to high energy savings for low-voltage operation as well as low-precision quantization, but also improves security of DNN accelerators.
arXiv Detail & Related papers (2021-04-16T19:11:14Z)
Ax-BxP: Approximate Blocked Computation for Precision-Reconfigurable Deep Neural Network Acceleration [3.7371886886933487]
Precision scaling has emerged as a popular technique to optimize the compute and storage requirements of Deep Neural Networks (DNNs) Efforts toward creating ultra-low-precision (sub-8-bit) DNNs suggest that the minimum precision required to achieve a given network-level accuracy varies considerably across networks. Previous proposals such as bit-serial hardware incur high overheads, significantly diminishing the benefits of lower precision.
arXiv Detail & Related papers (2020-11-25T20:00:38Z)
Neural Control Variates [71.42768823631918]
We show that a set of neural networks can face the challenge of finding a good approximation of the integrand. We derive a theoretically optimal, variance-minimizing loss function, and propose an alternative, composite loss for stable online training in practice. Specifically, we show that the learned light-field approximation is of sufficient quality for high-order bounces, allowing us to omit the error correction and thereby dramatically reduce the noise at the cost of negligible visible bias.
arXiv Detail & Related papers (2020-06-02T11:17:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.