DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data
Capacity of SRAM-based Processing-In-Memory
- URL: http://arxiv.org/abs/2310.20424v1
- Date: Tue, 31 Oct 2023 12:49:54 GMT
- Title: DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data
Capacity of SRAM-based Processing-In-Memory
- Authors: Cenlin Duan, Jianlei Yang, Xiaolin He, Yingjie Qi, Yikun Wang, Yiou
Wang, Ziyan He, Bonan Yan, Xueyan Wang, Xiaotao Jia, Weitao Pan, Weisheng
Zhao
- Abstract summary: We propose DDC-PIM, an efficient algorithm/architecture co-design methodology that effectively doubles the equivalent data capacity.
DDC-PIM yields about $2.84times$ speedup on MobileNetV2 and $2.69times$ on EfficientNet-B0 with negligible accuracy loss.
Compared with state-of-the-art macros, DDC-PIM achieves up to $8.41times$ and $2.75times$ improvement in weight density and area efficiency, respectively.
- Score: 6.367916611208411
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Processing-in-memory (PIM), as a novel computing paradigm, provides
significant performance benefits from the aspect of effective data movement
reduction. SRAM-based PIM has been demonstrated as one of the most promising
candidates due to its endurance and compatibility. However, the integration
density of SRAM-based PIM is much lower than other non-volatile memory-based
ones, due to its inherent 6T structure for storing a single bit. Within
comparable area constraints, SRAM-based PIM exhibits notably lower capacity.
Thus, aiming to unleash its capacity potential, we propose DDC-PIM, an
efficient algorithm/architecture co-design methodology that effectively doubles
the equivalent data capacity. At the algorithmic level, we propose a
filter-wise complementary correlation (FCC) algorithm to obtain a bitwise
complementary pair. At the architecture level, we exploit the intrinsic
cross-coupled structure of 6T SRAM to store the bitwise complementary pair in
their complementary states ($Q/\overline{Q}$), thereby maximizing the data
capacity of each SRAM cell. The dual-broadcast input structure and
reconfigurable unit support both depthwise and pointwise convolution, adhering
to the requirements of various neural networks. Evaluation results show that
DDC-PIM yields about $2.84\times$ speedup on MobileNetV2 and $2.69\times$ on
EfficientNet-B0 with negligible accuracy loss compared with PIM baseline
implementation. Compared with state-of-the-art SRAM-based PIM macros, DDC-PIM
achieves up to $8.41\times$ and $2.75\times$ improvement in weight density and
area efficiency, respectively.
Related papers
- Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss [59.835032408496545]
We propose a tile-based strategy that partitions the contrastive loss calculation into arbitrary small blocks.
We also introduce a multi-level tiling strategy to leverage the hierarchical structure of distributed systems.
Compared to SOTA memory-efficient solutions, it achieves a two-order-of-magnitude reduction in memory while maintaining comparable speed.
arXiv Detail & Related papers (2024-10-22T17:59:30Z) - EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE.
Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - Containing Analog Data Deluge at Edge through Frequency-Domain
Compression in Collaborative Compute-in-Memory Networks [0.0]
This paper proposes a novel solution to improve area efficiency in deep learning inference tasks.
By processing analog data more efficiently, it is possible to selectively retain valuable data from sensors and alleviate the challenges posed by the analog data deluge.
arXiv Detail & Related papers (2023-09-20T03:52:04Z) - Accelerating Neural Network Inference with Processing-in-DRAM: From the
Edge to the Cloud [9.927754948343326]
A neural network's performance (and energy efficiency) can be bound either by computation or memory resources.
The processing-in-memory (PIM) paradigm is a viable solution to accelerate memory-bound NNs.
We analyze three state-of-the-art PIM architectures for NN performance and energy efficiency.
arXiv Detail & Related papers (2022-09-19T11:46:05Z) - Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of
Peripherals [11.31429464715989]
This paper presents a new PIM architecture to efficiently accelerate deep learning tasks.
It is proposed to minimize the required A/D conversions with analog accumulation and neural approximated peripheral circuits.
Evaluations on different benchmarks demonstrate that Neural-PIM can improve energy efficiency by 5.36x (1.73x) and speed up throughput by 3.43x (1.59x) without losing accuracy.
arXiv Detail & Related papers (2022-01-30T16:14:49Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and
Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage.
We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation.
We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z) - MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with
Co-designed Compressed Neural Networks [0.6817102408452476]
Convolutional neural networks (CNNs) play a key role in deep learning applications.
CIM architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication.
To reduce computation costs, network pruning and quantization are two widely studied compression methods to shrink the model size.
arXiv Detail & Related papers (2020-10-24T10:31:49Z) - Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration [14.958793135751149]
Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM)
Exploiting data sparsity is a common approach to further accelerate GEMM for CNN inference, and in particular, structural sparsity has the advantages of predictable load balancing and very low index overhead.
We address a key architectural challenge with structural sparsity: how to provide support for a range of sparsity levels while maintaining high utilization of the hardware.
arXiv Detail & Related papers (2020-09-04T20:17:42Z) - Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of
Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models.
This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.