Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of
Peripherals
- URL: http://arxiv.org/abs/2201.12861v1
- Date: Sun, 30 Jan 2022 16:14:49 GMT
- Title: Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of
Peripherals
- Authors: Weidong Cao, Yilong Zhao, Adith Boloor, Yinhe Han, Xuan Zhang, Li
Jiang
- Abstract summary: This paper presents a new PIM architecture to efficiently accelerate deep learning tasks.
It is proposed to minimize the required A/D conversions with analog accumulation and neural approximated peripheral circuits.
Evaluations on different benchmarks demonstrate that Neural-PIM can improve energy efficiency by 5.36x (1.73x) and speed up throughput by 3.43x (1.59x) without losing accuracy.
- Score: 11.31429464715989
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Processing-in-memory (PIM) architectures have demonstrated great potential in
accelerating numerous deep learning tasks. Particularly, resistive
random-access memory (RRAM) devices provide a promising hardware substrate to
build PIM accelerators due to their abilities to realize efficient in-situ
vector-matrix multiplications (VMMs). However, existing PIM accelerators suffer
from frequent and energy-intensive analog-to-digital (A/D) conversions,
severely limiting their performance. This paper presents a new PIM architecture
to efficiently accelerate deep learning tasks by minimizing the required A/D
conversions with analog accumulation and neural approximated peripheral
circuits. We first characterize the different dataflows employed by existing
PIM accelerators, based on which a new dataflow is proposed to remarkably
reduce the required A/D conversions for VMMs by extending shift and add (S+A)
operations into the analog domain before the final quantizations. We then
leverage a neural approximation method to design both analog accumulation
circuits (S+A) and quantization circuits (ADCs) with RRAM crossbar arrays in a
highly-efficient manner. Finally, we apply them to build an RRAM-based PIM
accelerator (i.e., \textbf{Neural-PIM}) upon the proposed analog dataflow and
evaluate its system-level performance. Evaluations on different benchmarks
demonstrate that Neural-PIM can improve energy efficiency by 5.36x (1.73x) and
speed up throughput by 3.43x (1.59x) without losing accuracy, compared to the
state-of-the-art RRAM-based PIM accelerators, i.e., ISAAC (CASCADE).
Related papers
- Joint Transmit and Pinching Beamforming for PASS: Optimization-Based or Learning-Based? [89.05848771674773]
A novel antenna system ()-enabled downlink multi-user multiple-input single-output (MISO) framework is proposed.
It consists of multiple waveguides, which equip numerous low-cost antennas, named (PAs)
The positions of PAs can be reconfigured to both spanning large-scale path and space.
arXiv Detail & Related papers (2025-02-12T18:54:10Z) - A Fully Hardware Implemented Accelerator Design in ReRAM Analog Computing without ADCs [5.6496088684920345]
ReRAM-based accelerators process neural networks via analog Computing-in-Memory (CiM) for ultra-high energy efficiency.
This work explores the hardware implementation of the Sigmoid and SoftMax activation functions of neural networks with crossbarally binarized neurons.
We propose a complete ReRAM-based Analog Computing Accelerator (RACA) that accelerates neural network computation by leveraging inferenceally binarized neurons.
arXiv Detail & Related papers (2024-12-27T09:38:19Z) - EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE that surpasses the existing parallelism schemes.
Our results demonstrate at most 52.4% improvement in prefill throughput compared to existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - EPIM: Efficient Processing-In-Memory Accelerators based on Epitome [78.79382890789607]
We introduce the Epitome, a lightweight neural operator offering convolution-like functionality.
On the software side, we evaluate epitomes' latency and energy on PIM accelerators.
We introduce a PIM-aware layer-wise design method to enhance their hardware efficiency.
arXiv Detail & Related papers (2023-11-12T17:56:39Z) - Containing Analog Data Deluge at Edge through Frequency-Domain
Compression in Collaborative Compute-in-Memory Networks [0.0]
This paper proposes a novel solution to improve area efficiency in deep learning inference tasks.
By processing analog data more efficiently, it is possible to selectively retain valuable data from sensors and alleviate the challenges posed by the analog data deluge.
arXiv Detail & Related papers (2023-09-20T03:52:04Z) - ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency
Transformation [2.7488316163114823]
This paper proposes a novel approach to an energy-efficient acceleration of frequency-domain neural networks by utilizing analog-domain frequency-based tensor transformations.
Our approach achieves more compact cells by eliminating the need for trainable parameters in the transformation matrix.
On a 16$times$16 crossbars, for 8-bit input processing, the proposed approach achieves the energy efficiency of 1602 tera operations per second per Watt.
arXiv Detail & Related papers (2023-09-04T19:19:39Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Collaborative Intelligent Reflecting Surface Networks with Multi-Agent
Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks.
In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z) - One-step regression and classification with crosspoint resistive memory
arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge.
One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition.
Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z) - A New MRAM-based Process In-Memory Accelerator for Efficient Neural
Network Training with Floating Point Precision [28.458719513745812]
We propose a spin orbit torque magnetic random access memory (SOT-MRAM) based digital PIM accelerator that supports floating point precision.
Experiment results show that the proposed SOT-MRAM PIM based DNN training accelerator can achieve 3.3$times$, 1.8$times$, and 2.5$times$ improvement in terms of energy, latency, and area.
arXiv Detail & Related papers (2020-03-02T04:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.