NicePIM: Design Space Exploration for Processing-In-Memory DNN
Accelerators with 3D-Stacked-DRAM
- URL: http://arxiv.org/abs/2305.19041v1
- Date: Tue, 30 May 2023 13:58:13 GMT
- Title: NicePIM: Design Space Exploration for Processing-In-Memory DNN
Accelerators with 3D-Stacked-DRAM
- Authors: Junpeng Wang, Mengke Ge, Bo Ding, Qi Xu, Song Chen, Yi Kang
- Abstract summary: NicePIM can optimize hardware configurations for DRAM-PIM systems effectively.
It can generate high-quality DNN mapping schemes with latency and energy cost reduced by 37% and 28% on average.
- Score: 10.802292525404994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the widespread use of deep neural networks(DNNs) in intelligent systems,
DNN accelerators with high performance and energy efficiency are greatly
demanded. As one of the feasible processing-in-memory(PIM) architectures,
3D-stacked-DRAM-based PIM(DRAM-PIM) architecture enables large-capacity memory
and low-cost memory access, which is a promising solution for DNN accelerators
with better performance and energy efficiency. However, the low-cost
characteristics of stacked DRAM and the distributed manner of memory access and
data storing require us to rebalance the hardware design and DNN mapping. In
this paper, we propose NicePIM to efficiently explore the design space of
hardware architecture and DNN mapping of DRAM-PIM accelerators, which consists
of three key components: PIM-Tuner, PIM-Mapper and Data-Scheduler. PIM-Tuner
optimizes the hardware configurations leveraging a DNN model for classifying
area-compliant architectures and a deep kernel learning model for identifying
better hardware parameters. PIM-Mapper explores a variety of DNN mapping
configurations, including parallelism between branches of DNN, DNN layer
partitioning, DRAM capacity allocation and data layout pattern in DRAM to
generate high-hardware-utilization DNN mapping schemes for various hardware
configurations. The Data-Scheduler employs an integer-linear-programming-based
data scheduling algorithm to alleviate the inter-PIM-node communication
overhead of data-sharing brought by DNN layer partitioning. Experimental
results demonstrate that NicePIM can optimize hardware configurations for
DRAM-PIM systems effectively and can generate high-quality DNN mapping schemes
with latency and energy cost reduced by 37% and 28% on average respectively
compared to the baseline method.
Related papers
- DCP: Learning Accelerator Dataflow for Neural Network via Propagation [52.06154296196845]
This work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort.
DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives.
For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples.
arXiv Detail & Related papers (2024-10-09T05:16:44Z) - Spiker+: a framework for the generation of efficient Spiking Neural
Networks FPGA accelerators for inference at the edge [49.42371633618761]
Spiker+ is a framework for generating efficient, low-power, and low-area customized Spiking Neural Networks (SNN) accelerators on FPGA for inference at the edge.
Spiker+ is tested on two benchmark datasets, the MNIST and the Spiking Heidelberg Digits (SHD)
arXiv Detail & Related papers (2024-01-02T10:42:42Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - Accelerating Neural Network Inference with Processing-in-DRAM: From the
Edge to the Cloud [9.927754948343326]
A neural network's performance (and energy efficiency) can be bound either by computation or memory resources.
The processing-in-memory (PIM) paradigm is a viable solution to accelerate memory-bound NNs.
We analyze three state-of-the-art PIM architectures for NN performance and energy efficiency.
arXiv Detail & Related papers (2022-09-19T11:46:05Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - Impact of On-Chip Interconnect on In-Memory Acceleration of Deep Neural
Networks [11.246977770747526]
Increase in connection density increases on-chip data movement.
We show that the point-to-point (P2P)-based interconnect is incapable of handling a high volume of on-chip data movement.
We propose a technique to determine the optimal choice of interconnect for any given DNN.
arXiv Detail & Related papers (2021-07-06T02:44:00Z) - PIM-DRAM:Accelerating Machine Learning Workloads using Processing in
Memory based on DRAM Technology [2.6168147530506958]
We propose a processing-in-memory (PIM) multiplication primitive to accelerate matrix vector operations in ML workloads.
We show that the proposed architecture, mapping, and data flow can provide up to 23x and 6.5x benefits over a GPU.
arXiv Detail & Related papers (2021-05-08T16:39:24Z) - A New MRAM-based Process In-Memory Accelerator for Efficient Neural
Network Training with Floating Point Precision [28.458719513745812]
We propose a spin orbit torque magnetic random access memory (SOT-MRAM) based digital PIM accelerator that supports floating point precision.
Experiment results show that the proposed SOT-MRAM PIM based DNN training accelerator can achieve 3.3$times$, 1.8$times$, and 2.5$times$ improvement in terms of energy, latency, and area.
arXiv Detail & Related papers (2020-03-02T04:58:54Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.