Software-Level Accuracy Using Stochastic Computing With
Charge-Trap-Flash Based Weight Matrix
- URL: http://arxiv.org/abs/2004.11120v1
- Date: Mon, 9 Mar 2020 02:45:58 GMT
- Title: Software-Level Accuracy Using Stochastic Computing With
Charge-Trap-Flash Based Weight Matrix
- Authors: Varun Bhatt, Shalini Shrivastava, Tanmay Chavan, Udayan Ganguly
- Abstract summary: Charge Trap Flash (CTF) memory was shown to have a large number of levels before saturation, but variable non-linearity.
We show, through simulations, that at an optimum choice of the range, our system performs nearly as well as the models trained using exact floating point operations.
We also show its use in reinforcement learning, where it is used for value function approximation in Q-Learning, and learns to complete an episode the mountain car control problem in around 146 steps.
- Score: 2.580765958706854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The in-memory computing paradigm with emerging memory devices has been
recently shown to be a promising way to accelerate deep learning. Resistive
processing unit (RPU) has been proposed to enable the vector-vector outer
product in a crossbar array using a stochastic train of identical pulses to
enable one-shot weight update, promising intense speed-up in matrix
multiplication operations, which form the bulk of training neural networks.
However, the performance of the system suffers if the device does not satisfy
the condition of linear conductance change over around 1,000 conductance
levels. This is a challenge for nanoscale memories. Recently, Charge Trap Flash
(CTF) memory was shown to have a large number of levels before saturation, but
variable non-linearity. In this paper, we explore the trade-off between the
range of conductance change and linearity. We show, through simulations, that
at an optimum choice of the range, our system performs nearly as well as the
models trained using exact floating point operations, with less than 1%
reduction in the performance. Our system reaches an accuracy of 97.9% on MNIST
dataset, 89.1% and 70.5% accuracy on CIFAR-10 and CIFAR-100 datasets (using
pre-extracted features). We also show its use in reinforcement learning, where
it is used for value function approximation in Q-Learning, and learns to
complete an episode the mountain car control problem in around 146 steps.
Benchmarked to state-of-the-art, the CTF based RPU shows best in class
performance to enable software equivalent performance.
Related papers
- Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - Hadamard Domain Training with Integers for Class Incremental Quantized
Learning [1.4416751609100908]
Continual learning can be cost-prohibitive for resource-constraint edge platforms.
We propose a technique that transforms to enable low-precision training with only integer matrix multiplications.
We achieve less than 0.5% and 3% accuracy degradation while we quantize all matrix multiplications inputs down to 4-bits with 8-bit accumulators.
arXiv Detail & Related papers (2023-10-05T16:52:59Z) - A Distributed Data-Parallel PyTorch Implementation of the Distributed
Shampoo Optimizer for Training Neural Networks At-Scale [5.206015354543744]
Shampoo is an online and optimization algorithm belonging to the AdaGrad family of methods for training neural networks.
We provide a complete description of the algorithm as well as the performance optimizations that our implementation leverages to train deep networks at-scale in PyTorch.
arXiv Detail & Related papers (2023-09-12T18:11:10Z) - Hardware-aware Training Techniques for Improving Robustness of Ex-Situ
Neural Network Transfer onto Passive TiO2 ReRAM Crossbars [0.8553766625486795]
Training approaches that adapt techniques such as dropout, the reparametrization trick and regularization to TiO2 crossbar variabilities are proposed.
For the neural network trained using the proposed hardware-aware method, 79.5% of the test set's data points can be classified with an accuracy of 95% or higher.
arXiv Detail & Related papers (2023-05-29T13:55:02Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - Convolutional Neural Networks for the classification of glitches in
gravitational-wave data streams [52.77024349608834]
We classify transient noise signals (i.e.glitches) and gravitational waves in data from the Advanced LIGO detectors.
We use models with a supervised learning approach, both trained from scratch using the Gravity Spy dataset.
We also explore a self-supervised approach, pre-training models with automatically generated pseudo-labels.
arXiv Detail & Related papers (2023-03-24T11:12:37Z) - Towards Memory- and Time-Efficient Backpropagation for Training Spiking
Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing.
We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency.
Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z) - Quantized Neural Networks for Low-Precision Accumulation with Guaranteed
Overflow Avoidance [68.8204255655161]
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference.
We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline.
arXiv Detail & Related papers (2023-01-31T02:46:57Z) - Efficient On-device Training via Gradient Filtering [14.484604762427717]
We propose a new gradient filtering approach which enables on-device CNN model training.
Our approach creates a special structure with fewer unique elements in the gradient map.
Our approach opens up a new direction of research with a huge potential for on-device training.
arXiv Detail & Related papers (2023-01-01T02:33:03Z) - Improving compute efficacy frontiers with SliceOut [31.864949424541344]
We introduce SliceOut -- a dropout-inspired scheme to train deep learning models faster without impacting final test accuracy.
At test time, turning off SliceOut performs an implicit ensembling across a linear number of architectures that preserves test accuracy.
This leads to faster processing of large computational workloads overall, and significantly reduce the resulting energy consumption and CO2emissions.
arXiv Detail & Related papers (2020-07-21T15:59:09Z) - One-step regression and classification with crosspoint resistive memory
arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge.
One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition.
Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.