Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of
Peripherals
- URL: http://arxiv.org/abs/2201.12861v1
- Date: Sun, 30 Jan 2022 16:14:49 GMT
- Title: Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of
Peripherals
- Authors: Weidong Cao, Yilong Zhao, Adith Boloor, Yinhe Han, Xuan Zhang, Li
Jiang
- Abstract summary: This paper presents a new PIM architecture to efficiently accelerate deep learning tasks.
It is proposed to minimize the required A/D conversions with analog accumulation and neural approximated peripheral circuits.
Evaluations on different benchmarks demonstrate that Neural-PIM can improve energy efficiency by 5.36x (1.73x) and speed up throughput by 3.43x (1.59x) without losing accuracy.
- Score: 11.31429464715989
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Processing-in-memory (PIM) architectures have demonstrated great potential in
accelerating numerous deep learning tasks. Particularly, resistive
random-access memory (RRAM) devices provide a promising hardware substrate to
build PIM accelerators due to their abilities to realize efficient in-situ
vector-matrix multiplications (VMMs). However, existing PIM accelerators suffer
from frequent and energy-intensive analog-to-digital (A/D) conversions,
severely limiting their performance. This paper presents a new PIM architecture
to efficiently accelerate deep learning tasks by minimizing the required A/D
conversions with analog accumulation and neural approximated peripheral
circuits. We first characterize the different dataflows employed by existing
PIM accelerators, based on which a new dataflow is proposed to remarkably
reduce the required A/D conversions for VMMs by extending shift and add (S+A)
operations into the analog domain before the final quantizations. We then
leverage a neural approximation method to design both analog accumulation
circuits (S+A) and quantization circuits (ADCs) with RRAM crossbar arrays in a
highly-efficient manner. Finally, we apply them to build an RRAM-based PIM
accelerator (i.e., \textbf{Neural-PIM}) upon the proposed analog dataflow and
evaluate its system-level performance. Evaluations on different benchmarks
demonstrate that Neural-PIM can improve energy efficiency by 5.36x (1.73x) and
speed up throughput by 3.43x (1.59x) without losing accuracy, compared to the
state-of-the-art RRAM-based PIM accelerators, i.e., ISAAC (CASCADE).
Related papers
- SpiDR: A Reconfigurable Digital Compute-in-Memory Spiking Neural Network Accelerator for Event-based Perception [8.968583287058959]
Spiking Neural Networks (SNNs) offer an efficient method for processing the asynchronous temporal data generated by Dynamic Vision Sensors (DVS)
Existing SNN accelerators suffer from limitations in adaptability to diverse neuron models, bit precisions and network sizes.
We propose a scalable and reconfigurable digital compute-in-memory (CIM) SNN accelerator chipname with a set of key features.
arXiv Detail & Related papers (2024-11-05T06:59:02Z) - EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE.
Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model [55.116403765330084]
Current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency.
We propose a time-continuous and analog in-memory neural differential equation solver for score-based diffusion.
We experimentally validate our solution with 180 nm resistive memory in-memory computing macros.
arXiv Detail & Related papers (2024-04-08T16:34:35Z) - EPIM: Efficient Processing-In-Memory Accelerators based on Epitome [78.79382890789607]
We introduce the Epitome, a lightweight neural operator offering convolution-like functionality.
On the software side, we evaluate epitomes' latency and energy on PIM accelerators.
We introduce a PIM-aware layer-wise design method to enhance their hardware efficiency.
arXiv Detail & Related papers (2023-11-12T17:56:39Z) - Containing Analog Data Deluge at Edge through Frequency-Domain
Compression in Collaborative Compute-in-Memory Networks [0.0]
This paper proposes a novel solution to improve area efficiency in deep learning inference tasks.
By processing analog data more efficiently, it is possible to selectively retain valuable data from sensors and alleviate the challenges posed by the analog data deluge.
arXiv Detail & Related papers (2023-09-20T03:52:04Z) - ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency
Transformation [2.7488316163114823]
This paper proposes a novel approach to an energy-efficient acceleration of frequency-domain neural networks by utilizing analog-domain frequency-based tensor transformations.
Our approach achieves more compact cells by eliminating the need for trainable parameters in the transformation matrix.
On a 16$times$16 crossbars, for 8-bit input processing, the proposed approach achieves the energy efficiency of 1602 tera operations per second per Watt.
arXiv Detail & Related papers (2023-09-04T19:19:39Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Collaborative Intelligent Reflecting Surface Networks with Multi-Agent
Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks.
In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z) - One-step regression and classification with crosspoint resistive memory
arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge.
One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition.
Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z) - A New MRAM-based Process In-Memory Accelerator for Efficient Neural
Network Training with Floating Point Precision [28.458719513745812]
We propose a spin orbit torque magnetic random access memory (SOT-MRAM) based digital PIM accelerator that supports floating point precision.
Experiment results show that the proposed SOT-MRAM PIM based DNN training accelerator can achieve 3.3$times$, 1.8$times$, and 2.5$times$ improvement in terms of energy, latency, and area.
arXiv Detail & Related papers (2020-03-02T04:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.