PSCNN: A 885.86 TOPS/W Programmable SRAM-based Computing-In-Memory
Processor for Keyword Spotting
- URL: http://arxiv.org/abs/2205.01569v1
- Date: Mon, 2 May 2022 09:58:18 GMT
- Title: PSCNN: A 885.86 TOPS/W Programmable SRAM-based Computing-In-Memory
Processor for Keyword Spotting
- Authors: Shu-Hung Kuo, and Tian-Sheuan Chang
- Abstract summary: This paper proposes a programmable CIM processor with a single large sized CIM macro instead of multiple smaller ones for power efficient computation.
The proposed architecture adopts the pooling write-back method to support fused or independent convolution/pooling operations to reduce 35.9% of latency.
The design fabricated in TSMC 28nm technology achieves 150.8 GOPS throughput and 885.86 TOPS/W power efficiency at 10 MHz when executing our binary keyword spotting model.
- Score: 0.10547353841674209
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Computing-in-memory (CIM) has attracted significant attentions in recent
years due to its massive parallelism and low power consumption. However,
current CIM designs suffer from large area overhead of small CIM macros and bad
programmablity for model execution. This paper proposes a programmable CIM
processor with a single large sized CIM macro instead of multiple smaller ones
for power efficient computation and a flexible instruction set to support
various binary 1-D convolution Neural Network (CNN) models in an easy way.
Furthermore, the proposed architecture adopts the pooling write-back method to
support fused or independent convolution/pooling operations to reduce 35.9\% of
latency, and the flexible ping-pong feature SRAM to fit different feature map
sizes during layer-by-layer execution.The design fabricated in TSMC 28nm
technology achieves 150.8 GOPS throughput and 885.86 TOPS/W power efficiency at
10 MHz when executing our binary keyword spotting model, which has higher power
efficiency and flexibility than previous designs.
Related papers
- An Event-Based Digital Compute-In-Memory Accelerator with Flexible Operand Resolution and Layer-Wise Weight/Output Stationarity [0.11522790873450185]
CIM accelerators for spiking neural networks (SNNs) are promising solutions to enable $mu$s-level inference latency and ultra-low energy in edge vision applications.
We propose a novel digital CIM macro that supports arbitrary operand resolution and shape, with a unified CIM storage for weights and membrane potentials.
Our approach can save up to 90% energy in large-scale systems, while reaching a state-of-the-art classification accuracy of 95.8% on the IBM DVS gesture dataset.
arXiv Detail & Related papers (2024-10-30T14:55:13Z) - EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE.
Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - EPIM: Efficient Processing-In-Memory Accelerators based on Epitome [78.79382890789607]
We introduce the Epitome, a lightweight neural operator offering convolution-like functionality.
On the software side, we evaluate epitomes' latency and energy on PIM accelerators.
We introduce a PIM-aware layer-wise design method to enhance their hardware efficiency.
arXiv Detail & Related papers (2023-11-12T17:56:39Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on
Edge [1.8293684411977293]
Deep Neural Network (DNN) based inference at the edge is challenging as these compute and data-intensive algorithms need to be implemented at low cost and low power.
We present RAMAN, a Re-configurable and spArse tinyML Accelerator for infereNce on edge, architected to exploit the sparsity to reduce area (storage), power as well as latency.
arXiv Detail & Related papers (2023-06-10T17:25:58Z) - Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and
Algorithm Co-design [66.39546326221176]
Attention-based neural networks have become pervasive in many AI tasks.
The use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources.
This paper proposes a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs.
arXiv Detail & Related papers (2022-09-20T09:28:26Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - MicroNet: Towards Image Recognition with Extremely Low FLOPs [117.96848315180407]
MicroNet is an efficient convolutional neural network using extremely low computational cost.
A family of MicroNets achieve a significant performance gain over the state-of-the-art in the low FLOP regime.
For instance, MicroNet-M1 achieves 61.1% top-1 accuracy on ImageNet classification with 12 MFLOPs, outperforming MobileNetV3 by 11.3%.
arXiv Detail & Related papers (2020-11-24T18:59:39Z) - MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with
Co-designed Compressed Neural Networks [0.6817102408452476]
Convolutional neural networks (CNNs) play a key role in deep learning applications.
CIM architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication.
To reduce computation costs, network pruning and quantization are two widely studied compression methods to shrink the model size.
arXiv Detail & Related papers (2020-10-24T10:31:49Z) - DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT
MCUs [6.403349961091506]
Low-Cost MCU-based end-nodes have limited on-chip memory and often replace caches with scratchpads.
DORY is an automatic tool to deploys on low cost MCUs with typically less than 1MB on-chip memory.
arXiv Detail & Related papers (2020-08-17T07:30:54Z) - Q-EEGNet: an Energy-Efficient 8-bit Quantized Parallel EEGNet
Implementation for Edge Motor-Imagery Brain--Machine Interfaces [16.381467082472515]
Motor-Imagery Brain--Machine Interfaces (MI-BMIs)promise direct and accessible communication between human brains and machines.
Deep learning models have emerged for classifying EEG signals.
These models often exceed the limitations of edge devices due to their memory and computational requirements.
arXiv Detail & Related papers (2020-04-24T12:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.