Related papers: Shavette: Low Power Neural Network Acceleration via Algorithm-level Error Detection and Undervolting

Shavette: Low Power Neural Network Acceleration via Algorithm-level Error Detection and Undervolting

URL: http://arxiv.org/abs/2410.13415v1
Date: Thu, 17 Oct 2024 10:29:15 GMT
Title: Shavette: Low Power Neural Network Acceleration via Algorithm-level Error Detection and Undervolting
Authors: Mikael Rinkinen, Lauri Koskinen, Olli Silven, Mehdi Safarpour,
Abstract summary: This brief introduces a simple approach for enabling reduced voltage operation of Deep Neural Network (DNN) accelerators by mere software modifications. We demonstrate 18% to 25% energy saving with no accuracy loss of the models and negligible throughput compromise.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Reduced voltage operation is an effective technique for substantial energy efficiency improvement in digital circuits. This brief introduces a simple approach for enabling reduced voltage operation of Deep Neural Network (DNN) accelerators by mere software modifications. Conventional approaches for enabling reduced voltage operation e.g., Timing Error Detection (TED) systems, incur significant development costs and overheads, while not being applicable to the off-the-shelf components. Contrary to those, the solution proposed in this paper relies on algorithm-based error detection, and hence, is implemented with low development costs, does not require any circuit modifications, and is even applicable to commodity devices. By showcasing the solution through experimenting on popular DNNs, i.e., LeNet and VGG16, on a GPU platform, we demonstrate 18% to 25% energy saving with no accuracy loss of the models and negligible throughput compromise (< 3.9%), considering the overheads from integration of the error detection schemes into the DNN. The integration of presented algorithmic solution into the design is simpler when compared conventional TED based techniques that require extensive circuit-level modifications, cell library characterizations or special support from the design tools.

Related papers

Estimating Voltage Drop: Models, Features and Data Representation Towards a Neural Surrogate [1.7010199949406575]
We investigate how Machine Learning (ML) techniques can aid in reducing the computational effort and implicitly the time required to estimate the voltage drop in Integrated Circuits (ICs) Our approach leverages ASICs' electrical, timing, and physical to train ML models, ensuring adaptability across diverse designs with minimal adjustments. This study illustrates the effectiveness of ML algorithms in precisely estimating IR drop and optimizing ASIC sign-off.
arXiv Detail & Related papers (2025-02-07T21:31:13Z)
Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis [50.18156030818883]
Anomaly and missing data constitute a thorny problem in industrial applications. Deep learning enabled anomaly detection has emerged as a critical direction. The data collected in edge devices contain user privacy.
arXiv Detail & Related papers (2024-11-06T15:38:31Z)
Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders. We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z)
NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes [52.51014498593644]
Deep neural networks (DNNs) have become ubiquitous in machine learning, but their energy consumption remains a notable issue. We introduce NeuralFuse, a novel add-on module that addresses the accuracy-energy tradeoff in low-voltage regimes. At a 1% bit error rate, NeuralFuse can reduce memory access energy by up to 24% while recovering accuracy by up to 57%.
arXiv Detail & Related papers (2023-06-29T11:38:22Z)
Leveraging Residue Number System for Designing High-Precision Analog Deep Neural Network Accelerators [3.4218508703868595]
We use the residue number system (RNS) to compose high-precision operations from multiple low-precision operations. RNS can achieve 99% FP32 accuracy for state-of-the-art DNN inference using data converters with only $6$-bit precision.
arXiv Detail & Related papers (2023-06-15T20:24:18Z)
A new transformation for embedded convolutional neural network approach toward real-time servo motor overload fault-detection [0.0]
Overloading in DC servo motors is a major concern in industries, as many companies face the problem of finding expert operators. This paper proposed an embedded Artificial intelligence approach using a Convolutional Neural Network (CNN) using a new transformation to extract faults from real-time input signals without human interference.
arXiv Detail & Related papers (2023-04-08T13:36:33Z)
Fast Exploration of the Impact of Precision Reduction on Spiking Neural Networks [63.614519238823206]
Spiking Neural Networks (SNNs) are a practical choice when the target hardware reaches the edge of computing. We employ an Interval Arithmetic (IA) model to develop an exploration methodology that takes advantage of the capability of such a model to propagate the approximation error.
arXiv Detail & Related papers (2022-11-22T15:08:05Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
Positive/Negative Approximate Multipliers for DNN Accelerators [3.1921317895626493]
We present a filter-oriented approximation method to map the weights to the appropriate modes of the approximate multiplier. Our approach achieves 18.33% energy gains on average across 7 NNs on 4 different datasets for a maximum accuracy drop of only 1%.
arXiv Detail & Related papers (2021-07-20T09:36:24Z)
From DNNs to GANs: Review of efficient hardware architectures for deep learning [0.0]
Neural network and deep learning has been started to impact the present research paradigm. DSP processors are incapable of performing neural network, activation function, convolutional neural network and generative adversarial network operations. Different algorithms have been adapted to design a DSP processor compatible for fast performance in neural network, activation function, convolutional neural network and generative adversarial network.
arXiv Detail & Related papers (2021-06-06T13:23:06Z)
Bit Error Robustness for Energy-Efficient DNN Accelerators [93.58572811484022]
We show that a combination of robust fixed-point quantization, weight clipping, and random bit error training (RandBET) improves robustness against random bit errors. This leads to high energy savings from both low-voltage operation as well as low-precision quantization.
arXiv Detail & Related papers (2020-06-24T18:23:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.