Examining the Role and Limits of Batchnorm Optimization to Mitigate
Diverse Hardware-noise in In-memory Computing
- URL: http://arxiv.org/abs/2305.18416v1
- Date: Sun, 28 May 2023 19:07:25 GMT
- Title: Examining the Role and Limits of Batchnorm Optimization to Mitigate
Diverse Hardware-noise in In-memory Computing
- Authors: Abhiroop Bhattacharjee, Abhishek Moitra, Youngeun Kim, Yeshwanth
Venkatesha, and Priyadarshini Panda
- Abstract summary: In-Memory Computing (IMC) platforms such as analog crossbars are gaining focus as they facilitate the acceleration of low-precision Deep Neural Networks (DNNs) with high area- & compute-efficiencies.
The intrinsic non-idealities in crossbars, which are often non-deterministic and non-linear, degrade the performance of the deployed DNNs.
This work aims to examine the distortions caused by these non-idealities on the dot-product operations in analog crossbars.
- Score: 3.9488615467284225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In-Memory Computing (IMC) platforms such as analog crossbars are gaining
focus as they facilitate the acceleration of low-precision Deep Neural Networks
(DNNs) with high area- & compute-efficiencies. However, the intrinsic
non-idealities in crossbars, which are often non-deterministic and non-linear,
degrade the performance of the deployed DNNs. In addition to quantization
errors, most frequently encountered non-idealities during inference include
crossbar circuit-level parasitic resistances and device-level non-idealities
such as stochastic read noise and temporal drift. In this work, our goal is to
closely examine the distortions caused by these non-idealities on the
dot-product operations in analog crossbars and explore the feasibility of a
nearly training-less solution via crossbar-aware fine-tuning of batchnorm
parameters in real-time to mitigate the impact of the non-idealities. This
enables reduction in hardware costs in terms of memory and training energy for
IMC noise-aware retraining of the DNN weights on crossbars.
Related papers
- TSB: Tiny Shared Block for Efficient DNN Deployment on NVCIM Accelerators [11.496631244103773]
"Tiny Shared Block (TSB)" integrates a small shared 1x1 convolution block into the Deep Neural Network architecture.
TSB achieves over 20x inference accuracy gap improvement, over 5x training speedup, and weights-to-device mapping cost reduction.
arXiv Detail & Related papers (2024-05-08T20:53:38Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - SAMSON: Sharpness-Aware Minimization Scaled by Outlier Normalization for
Improving DNN Generalization and Robustness [11.249410336982258]
Energy-efficient deep neural network (DNN) accelerators are prone to non-idealities that degrade performance at inference time.
Existing methods typically add perturbations to the DNN weights during training to simulate inference on noisy hardware.
We show that applying sharpness-aware training, by optimizing for both the loss value and loss sharpness, significantly improves robustness to noisy hardware at inference time.
arXiv Detail & Related papers (2022-11-18T16:58:23Z) - Examining and Mitigating the Impact of Crossbar Non-idealities for
Accurate Implementation of Sparse Deep Neural Networks [2.4283778735260686]
We show how sparse Deep Neural Networks (DNNs) can lead to severe accuracy losses compared to unpruned DNNs mapped onto non-ideal crossbars.
We propose two mitigation approaches - Crossbar column rearrangement and Weight-Constrained-Training (WCT)
These help in mitigating non-idealities by increasing the proportion of low conductance synapses on crossbars, thereby improving their computational accuracies.
arXiv Detail & Related papers (2022-01-13T21:56:48Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Low-Precision Training in Logarithmic Number System using Multiplicative
Weight Update [49.948082497688404]
Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, leading to serious environmental impacts.
One promising approach to reduce the energy costs is representing DNNs with low-precision numbers.
We jointly design a lowprecision training framework involving a logarithmic number system (LNS) and a multiplicative weight update training method, termed LNS-Madam.
arXiv Detail & Related papers (2021-06-26T00:32:17Z) - AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural
Networks [78.62086125399831]
We present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of deep neural networks (DNNs)
AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets.
An important property of AC/DC is that it allows co-training of dense and sparse models, yielding accurate sparse-dense model pairs at the end of the training process.
arXiv Detail & Related papers (2021-06-23T13:23:00Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Dynamic Hard Pruning of Neural Networks at the Edge of the Internet [11.605253906375424]
Dynamic Hard Pruning (DynHP) technique incrementally prunes the network during training.
DynHP enables a tunable size reduction of the final neural network and reduces the NN memory occupancy during training.
Freed memory is reused by a emphdynamic batch sizing approach to counterbalance the accuracy degradation caused by the hard pruning strategy.
arXiv Detail & Related papers (2020-11-17T10:23:28Z) - A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell.
This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z) - TxSim:Modeling Training of Deep Neural Networks on Resistive Crossbar
Systems [3.1887081453726136]
crossbar-based computations face a major challenge due to a variety of device and circuit-level non-idealities.
We propose TxSim, a fast and customizable modeling framework to functionally evaluate DNN training on crossbar-based hardware.
arXiv Detail & Related papers (2020-02-25T19:29:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.