Bulk-Switching Memristor-based Compute-In-Memory Module for Deep Neural
Network Training
- URL: http://arxiv.org/abs/2305.14547v1
- Date: Tue, 23 May 2023 22:03:08 GMT
- Title: Bulk-Switching Memristor-based Compute-In-Memory Module for Deep Neural
Network Training
- Authors: Yuting Wu, Qiwen Wang, Ziyu Wang, Xinxin Wang, Buvna Ayyagari,
Siddarth Krishnan, Michael Chudzik and Wei D. Lu
- Abstract summary: We propose a mixed-precision training scheme for memristor-based compute-in-memory (CIM) modules.
The proposed scheme is implemented with a system-on-chip (SoC) of fully integrated analog CIM modules and digital sub-systems.
The efficacy of training larger models is evaluated using realistic hardware parameters and shows that analog CIM modules can enable efficient mix-precision training with accuracy comparable to full-precision software trained models.
- Score: 15.660697326769686
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The need for deep neural network (DNN) models with higher performance and
better functionality leads to the proliferation of very large models. Model
training, however, requires intensive computation time and energy.
Memristor-based compute-in-memory (CIM) modules can perform vector-matrix
multiplication (VMM) in situ and in parallel, and have shown great promises in
DNN inference applications. However, CIM-based model training faces challenges
due to non-linear weight updates, device variations, and low-precision in
analog computing circuits. In this work, we experimentally implement a
mixed-precision training scheme to mitigate these effects using a
bulk-switching memristor CIM module. Lowprecision CIM modules are used to
accelerate the expensive VMM operations, with high precision weight updates
accumulated in digital units. Memristor devices are only changed when the
accumulated weight update value exceeds a pre-defined threshold. The proposed
scheme is implemented with a system-on-chip (SoC) of fully integrated analog
CIM modules and digital sub-systems, showing fast convergence of LeNet training
to 97.73%. The efficacy of training larger models is evaluated using realistic
hardware parameters and shows that that analog CIM modules can enable efficient
mix-precision DNN training with accuracy comparable to full-precision software
trained models. Additionally, models trained on chip are inherently robust to
hardware variations, allowing direct mapping to CIM inference chips without
additional re-training.
Related papers
- Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models [31.960749305728488]
We introduce a novel concept dubbed modular neural tangent kernel (mNTK)
We show that the quality of a module's learning is tightly associated with its mNTK's principal eigenvalue $lambda_max$.
We propose a novel training strategy termed Modular Adaptive Training (MAT) to update those modules with their $lambda_max$ exceeding a dynamic threshold.
arXiv Detail & Related papers (2024-05-13T07:46:48Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - End-to-End Reinforcement Learning of Koopman Models for Economic Nonlinear Model Predictive Control [45.84205238554709]
We present a method for reinforcement learning of Koopman surrogate models for optimal performance as part of (e)NMPC.
We show that the end-to-end trained models outperform those trained using system identification in (e)NMPC.
arXiv Detail & Related papers (2023-08-03T10:21:53Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z) - PIM-QAT: Neural Network Quantization for Processing-In-Memory (PIM)
Systems [36.35995812401125]
We propose a PIM quantization aware training (PIM-QAT) algorithm, and introduce rescaling techniques to facilitate training convergence.
We also propose two techniques, namely batch normalization (BN) calibration and adjusted precision training, to suppress the adverse effects of non-ideal linearity and thermal noise involved in real PIM chips.
arXiv Detail & Related papers (2022-09-18T17:51:55Z) - Real-time Neural-MPC: Deep Learning Model Predictive Control for
Quadrotors and Agile Robotic Platforms [59.03426963238452]
We present Real-time Neural MPC, a framework to efficiently integrate large, complex neural network architectures as dynamics models within a model-predictive control pipeline.
We show the feasibility of our framework on real-world problems by reducing the positional tracking error by up to 82% when compared to state-of-the-art MPC approaches without neural network dynamics.
arXiv Detail & Related papers (2022-03-15T09:38:15Z) - Towards Efficient Post-training Quantization of Pre-trained Language
Models [85.68317334241287]
We study post-training quantization(PTQ) of PLMs, and propose module-wise quantization error minimization(MREM), an efficient solution to mitigate these issues.
Experiments on GLUE and SQuAD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys significant reductions in training time, memory overhead, and data consumption.
arXiv Detail & Related papers (2021-09-30T12:50:06Z) - Online Training of Spiking Recurrent Neural Networks with Phase-Change
Memory Synapses [1.9809266426888898]
Training spiking neural networks (RNNs) on dedicated neuromorphic hardware is still an open challenge.
We present a simulation framework of differential-architecture arrays based on an accurate and comprehensive Phase-Change Memory (PCM) device model.
We train a spiking RNN whose weights are emulated in the presented simulation framework, using a recently proposed e-prop learning rule.
arXiv Detail & Related papers (2021-08-04T01:24:17Z) - Low-Precision Training in Logarithmic Number System using Multiplicative
Weight Update [49.948082497688404]
Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, leading to serious environmental impacts.
One promising approach to reduce the energy costs is representing DNNs with low-precision numbers.
We jointly design a lowprecision training framework involving a logarithmic number system (LNS) and a multiplicative weight update training method, termed LNS-Madam.
arXiv Detail & Related papers (2021-06-26T00:32:17Z) - Hybrid In-memory Computing Architecture for the Training of Deep Neural
Networks [5.050213408539571]
We propose a hybrid in-memory computing architecture for the training of deep neural networks (DNNs) on hardware accelerators.
We show that HIC-based training results in about 50% less inference model size to achieve baseline comparable accuracy.
Our simulations indicate HIC-based training naturally ensures that the number of write-erase cycles seen by the devices is a small fraction of the endurance limit of PCM.
arXiv Detail & Related papers (2021-02-10T05:26:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.