Hybrid In-memory Computing Architecture for the Training of Deep Neural
Networks
- URL: http://arxiv.org/abs/2102.05271v1
- Date: Wed, 10 Feb 2021 05:26:27 GMT
- Title: Hybrid In-memory Computing Architecture for the Training of Deep Neural
Networks
- Authors: Vinay Joshi, Wangxin He, Jae-sun Seo and Bipin Rajendran
- Abstract summary: We propose a hybrid in-memory computing architecture for the training of deep neural networks (DNNs) on hardware accelerators.
We show that HIC-based training results in about 50% less inference model size to achieve baseline comparable accuracy.
Our simulations indicate HIC-based training naturally ensures that the number of write-erase cycles seen by the devices is a small fraction of the endurance limit of PCM.
- Score: 5.050213408539571
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The cost involved in training deep neural networks (DNNs) on von-Neumann
architectures has motivated the development of novel solutions for efficient
DNN training accelerators. We propose a hybrid in-memory computing (HIC)
architecture for the training of DNNs on hardware accelerators that results in
memory-efficient inference and outperforms baseline software accuracy in
benchmark tasks. We introduce a weight representation technique that exploits
both binary and multi-level phase-change memory (PCM) devices, and this leads
to a memory-efficient inference accelerator. Unlike previous in-memory
computing-based implementations, we use a low precision weight update
accumulator that results in more memory savings. We trained the ResNet-32
network to classify CIFAR-10 images using HIC. For a comparable model size,
HIC-based training outperforms baseline network, trained in floating-point
32-bit (FP32) precision, by leveraging appropriate network width multiplier.
Furthermore, we observe that HIC-based training results in about 50% less
inference model size to achieve baseline comparable accuracy. We also show that
the temporal drift in PCM devices has a negligible effect on post-training
inference accuracy for extended periods (year). Finally, our simulations
indicate HIC-based training naturally ensures that the number of write-erase
cycles seen by the devices is a small fraction of the endurance limit of PCM,
demonstrating the feasibility of this architecture for achieving hardware
platforms that can learn in the field.
Related papers
- Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - Harnessing Manycore Processors with Distributed Memory for Accelerated
Training of Sparse and Recurrent Models [43.1773057439246]
Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures.
We explore sparse and recurrent model training on a massively parallel multiple instruction multiple data architecture with distributed local memory.
arXiv Detail & Related papers (2023-11-07T23:18:35Z) - YFlows: Systematic Dataflow Exploration and Code Generation for
Efficient Neural Network Inference using SIMD Architectures on CPUs [3.1445034800095413]
We address the challenges associated with deploying neural networks on CPUs.
Our novel approach is to use the dataflow of a neural network to explore data reuse opportunities.
Our results show that the dataflow that keeps outputs in SIMD registers consistently yields the best performance.
arXiv Detail & Related papers (2023-10-01T05:11:54Z) - Synaptic metaplasticity with multi-level memristive devices [1.5598974049838272]
We propose a memristor-based hardware solution for implementing metaplasticity during both inference and training.
We show that a two-layer perceptron achieves 97% and 86% accuracy on consecutive training of MNIST and Fashion-MNIST.
Our architecture is compatible with the memristor limited endurance and has a 15x reduction in memory.
arXiv Detail & Related papers (2023-06-21T09:40:25Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Neural Architecture Search for Improving Latency-Accuracy Trade-off in
Split Computing [5.516431145236317]
Split computing is an emerging machine-learning inference technique that addresses the privacy and latency challenges of deploying deep learning in IoT systems.
In split computing, neural network models are separated and cooperatively processed using edge servers and IoT devices via networks.
This paper proposes a neural architecture search (NAS) method for split computing.
arXiv Detail & Related papers (2022-08-30T03:15:43Z) - GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction.
These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization.
We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining [65.39532971991778]
We present an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking.
We run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints.
FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors.
arXiv Detail & Related papers (2020-06-03T05:20:21Z) - One-step regression and classification with crosspoint resistive memory
arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge.
One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition.
Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z) - ESSOP: Efficient and Scalable Stochastic Outer Product Architecture for
Deep Learning [1.2019888796331233]
Matrix-vector multiplications (MVM) and vector-vector outer product (VVOP) are the two most expensive operations associated with the training of deep neural networks (DNNs)
We introduce efficient techniques to SC for weight update in DNNs with the activation functions required by many state-of-the-art networks.
Our architecture reduces the computational cost by re-using random numbers and replacing certain FP multiplication operations by bit shift scaling.
Hardware design of ESSOP at 14nm technology node shows that, compared to a highly pipelined FP16 multiplier, ESSOP is 82.2% and 93.7% better in energy
arXiv Detail & Related papers (2020-03-25T07:54:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.