Related papers: Software-Level Accuracy Using Stochastic Computing With Charge-Trap-Flash Based Weight Matrix

Software-Level Accuracy Using Stochastic Computing With Charge-Trap-Flash Based Weight Matrix

URL: http://arxiv.org/abs/2004.11120v1
Date: Mon, 9 Mar 2020 02:45:58 GMT
Title: Software-Level Accuracy Using Stochastic Computing With Charge-Trap-Flash Based Weight Matrix
Authors: Varun Bhatt, Shalini Shrivastava, Tanmay Chavan, Udayan Ganguly
Abstract summary: Charge Trap Flash (CTF) memory was shown to have a large number of levels before saturation, but variable non-linearity. We show, through simulations, that at an optimum choice of the range, our system performs nearly as well as the models trained using exact floating point operations. We also show its use in reinforcement learning, where it is used for value function approximation in Q-Learning, and learns to complete an episode the mountain car control problem in around 146 steps.
Score: 2.580765958706854
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The in-memory computing paradigm with emerging memory devices has been recently shown to be a promising way to accelerate deep learning. Resistive processing unit (RPU) has been proposed to enable the vector-vector outer product in a crossbar array using a stochastic train of identical pulses to enable one-shot weight update, promising intense speed-up in matrix multiplication operations, which form the bulk of training neural networks. However, the performance of the system suffers if the device does not satisfy the condition of linear conductance change over around 1,000 conductance levels. This is a challenge for nanoscale memories. Recently, Charge Trap Flash (CTF) memory was shown to have a large number of levels before saturation, but variable non-linearity. In this paper, we explore the trade-off between the range of conductance change and linearity. We show, through simulations, that at an optimum choice of the range, our system performs nearly as well as the models trained using exact floating point operations, with less than 1% reduction in the performance. Our system reaches an accuracy of 97.9% on MNIST dataset, 89.1% and 70.5% accuracy on CIFAR-10 and CIFAR-100 datasets (using pre-extracted features). We also show its use in reinforcement learning, where it is used for value function approximation in Q-Learning, and learns to complete an episode the mountain car control problem in around 146 steps. Benchmarked to state-of-the-art, the CTF based RPU shows best in class performance to enable software equivalent performance.

Related papers

Dynamic Gradient Sparse Update for Edge Training [0.0502254944841629]
gradient computation for backpropagation in the training requires significant memory buffers to store intermediate features and compute losses. This is unacceptable for memory-constrained edge devices such as microcontrollers. We propose a training acceleration method using dynamic gradient sparse updates to reduce memory usage.
arXiv Detail & Related papers (2025-03-23T06:32:12Z)
Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone. We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z)
Hadamard Domain Training with Integers for Class Incremental Quantized Learning [1.4416751609100908]
Continual learning can be cost-prohibitive for resource-constraint edge platforms. We propose a technique that transforms to enable low-precision training with only integer matrix multiplications. We achieve less than 0.5% and 3% accuracy degradation while we quantize all matrix multiplications inputs down to 4-bits with 8-bit accumulators.
arXiv Detail & Related papers (2023-10-05T16:52:59Z)
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale [5.206015354543744]
Shampoo is an online and optimization algorithm belonging to the AdaGrad family of methods for training neural networks. We provide a complete description of the algorithm as well as the performance optimizations that our implementation leverages to train deep networks at-scale in PyTorch.
arXiv Detail & Related papers (2023-09-12T18:11:10Z)
Hardware-aware Training Techniques for Improving Robustness of Ex-Situ Neural Network Transfer onto Passive TiO2 ReRAM Crossbars [0.8553766625486795]
Training approaches that adapt techniques such as dropout, the reparametrization trick and regularization to TiO2 crossbar variabilities are proposed. For the neural network trained using the proposed hardware-aware method, 79.5% of the test set's data points can be classified with an accuracy of 95% or higher.
arXiv Detail & Related papers (2023-05-29T13:55:02Z)
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z)
Convolutional Neural Networks for the classification of glitches in gravitational-wave data streams [52.77024349608834]
We classify transient noise signals (i.e.glitches) and gravitational waves in data from the Advanced LIGO detectors. We use models with a supervised learning approach, both trained from scratch using the Gravity Spy dataset. We also explore a self-supervised approach, pre-training models with automatically generated pseudo-labels.
arXiv Detail & Related papers (2023-03-24T11:12:37Z)
Towards Memory- and Time-Efficient Backpropagation for Training Spiking Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing. We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency. Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z)
Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance [68.8204255655161]
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference. We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline.
arXiv Detail & Related papers (2023-01-31T02:46:57Z)
Efficient On-device Training via Gradient Filtering [14.484604762427717]
We propose a new gradient filtering approach which enables on-device CNN model training. Our approach creates a special structure with fewer unique elements in the gradient map. Our approach opens up a new direction of research with a huge potential for on-device training.
arXiv Detail & Related papers (2023-01-01T02:33:03Z)
Improving compute efficacy frontiers with SliceOut [31.864949424541344]
We introduce SliceOut -- a dropout-inspired scheme to train deep learning models faster without impacting final test accuracy. At test time, turning off SliceOut performs an implicit ensembling across a linear number of architectures that preserves test accuracy. This leads to faster processing of large computational workloads overall, and significantly reduce the resulting energy consumption and CO2emissions.
arXiv Detail & Related papers (2020-07-21T15:59:09Z)
One-step regression and classification with crosspoint resistive memory arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge. One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition. Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.