LightNorm: Area and Energy-Efficient Batch Normalization Hardware for
On-Device DNN Training
- URL: http://arxiv.org/abs/2211.02686v1
- Date: Fri, 4 Nov 2022 18:08:57 GMT
- Title: LightNorm: Area and Energy-Efficient Batch Normalization Hardware for
On-Device DNN Training
- Authors: Seock-Hwan Noh, Junsang Park, Dahoon Park, Jahyun Koo, Jeik Choi,
Jaeha Kung
- Abstract summary: We present an extremely efficient batch normalization, named LightNorm, and its associated hardware module.
In more detail, we fuse three approximation techniques that are i) low bit-precision, ii) range batch normalization, andiii) block floating point.
- Score: 0.31806743741013654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When training early-stage deep neural networks (DNNs), generating
intermediate features via convolution or linear layers occupied most of the
execution time. Accordingly, extensive research has been done to reduce the
computational burden of the convolution or linear layers. In recent
mobile-friendly DNNs, however, the relative number of operations involved in
processing these layers has significantly reduced. As a result, the proportion
of the execution time of other layers, such as batch normalization layers, has
increased. Thus, in this work, we conduct a detailed analysis of the batch
normalization layer to efficiently reduce the runtime overhead in the batch
normalization process. Backed up by the thorough analysis, we present an
extremely efficient batch normalization, named LightNorm, and its associated
hardware module. In more detail, we fuse three approximation techniques that
are i) low bit-precision, ii) range batch normalization, and iii) block
floating point. All these approximate techniques are carefully utilized not
only to maintain the statistics of intermediate feature maps, but also to
minimize the off-chip memory accesses. By using the proposed LightNorm
hardware, we can achieve significant area and energy savings during the DNN
training without hurting the training accuracy. This makes the proposed
hardware a great candidate for the on-device training.
Related papers
- Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification [53.727688136434345]
Graph Neural Networks (GNNs) have shown superior performance in node classification.
We present Fast Graph Sharpness-Aware Minimization (FGSAM) that integrates the rapid training of Multi-Layer Perceptrons with the superior performance of GNNs.
Our proposed algorithm outperforms the standard SAM with lower computational costs in FSNC tasks.
arXiv Detail & Related papers (2024-10-22T09:33:29Z) - Towards Cheaper Inference in Deep Networks with Lower Bit-Width
Accumulators [25.100092698906437]
Current hardware still relies on high-accuracy core operations.
This is because, so far, the usage of low-precision accumulators led to a significant degradation in performance.
We present a simple method to train and fine-tune high-end DNNs, to allow, for the first time, utilization of cheaper, $12$-bits accumulators.
arXiv Detail & Related papers (2024-01-25T11:46:01Z) - Training Integer-Only Deep Recurrent Neural Networks [3.1829446824051195]
We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN)
Our approach supports layer normalization, attention, and an adaptive piecewise linear (PWL) approximation of activation functions.
The proposed method enables RNN-based language models to run on edge devices with $2times$ improvement in runtime.
arXiv Detail & Related papers (2022-12-22T15:22:36Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - AEGNN: Asynchronous Event-based Graph Neural Networks [54.528926463775946]
Event-based Graph Neural Networks generalize standard GNNs to process events as "evolving"-temporal graphs.
AEGNNs are easily trained on synchronous inputs and can be converted to efficient, "asynchronous" networks at test time.
arXiv Detail & Related papers (2022-03-31T16:21:12Z) - DNN Training Acceleration via Exploring GPGPU Friendly Sparsity [16.406482603838157]
We propose the Approximate Random Dropout that replaces the conventional random dropout of neurons and synapses with a regular and online generated row-based or tile-based dropout patterns.
We then develop a SGD-based Search Algorithm that produces the distribution of row-based or tile-based dropout patterns to compensate for the potential accuracy loss.
We also propose the sensitivity-aware dropout method to dynamically drop the input feature maps based on their sensitivity so as to achieve greater forward and backward training acceleration.
arXiv Detail & Related papers (2022-03-11T01:32:03Z) - Navigating Local Minima in Quantized Spiking Neural Networks [3.1351527202068445]
Spiking and Quantized Neural Networks (NNs) are becoming exceedingly important for hyper-efficient implementations of Deep Learning (DL) algorithms.
These networks face challenges when trained using error backpropagation, due to the absence of gradient signals when applying hard thresholds.
This paper presents a systematic evaluation of a cosine-annealed LR schedule coupled with weight-independent adaptive moment estimation.
arXiv Detail & Related papers (2022-02-15T06:42:25Z) - Low-Precision Training in Logarithmic Number System using Multiplicative
Weight Update [49.948082497688404]
Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, leading to serious environmental impacts.
One promising approach to reduce the energy costs is representing DNNs with low-precision numbers.
We jointly design a lowprecision training framework involving a logarithmic number system (LNS) and a multiplicative weight update training method, termed LNS-Madam.
arXiv Detail & Related papers (2021-06-26T00:32:17Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.