PSCNN: A 885.86 TOPS/W Programmable SRAM-based Computing-In-Memory
Processor for Keyword Spotting
- URL: http://arxiv.org/abs/2205.01569v1
- Date: Mon, 2 May 2022 09:58:18 GMT
- Title: PSCNN: A 885.86 TOPS/W Programmable SRAM-based Computing-In-Memory
Processor for Keyword Spotting
- Authors: Shu-Hung Kuo, and Tian-Sheuan Chang
- Abstract summary: This paper proposes a programmable CIM processor with a single large sized CIM macro instead of multiple smaller ones for power efficient computation.
The proposed architecture adopts the pooling write-back method to support fused or independent convolution/pooling operations to reduce 35.9% of latency.
The design fabricated in TSMC 28nm technology achieves 150.8 GOPS throughput and 885.86 TOPS/W power efficiency at 10 MHz when executing our binary keyword spotting model.
- Score: 0.10547353841674209
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Computing-in-memory (CIM) has attracted significant attentions in recent
years due to its massive parallelism and low power consumption. However,
current CIM designs suffer from large area overhead of small CIM macros and bad
programmablity for model execution. This paper proposes a programmable CIM
processor with a single large sized CIM macro instead of multiple smaller ones
for power efficient computation and a flexible instruction set to support
various binary 1-D convolution Neural Network (CNN) models in an easy way.
Furthermore, the proposed architecture adopts the pooling write-back method to
support fused or independent convolution/pooling operations to reduce 35.9\% of
latency, and the flexible ping-pong feature SRAM to fit different feature map
sizes during layer-by-layer execution.The design fabricated in TSMC 28nm
technology achieves 150.8 GOPS throughput and 885.86 TOPS/W power efficiency at
10 MHz when executing our binary keyword spotting model, which has higher power
efficiency and flexibility than previous designs.
Related papers
- MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling [80.48332380100915]
MiniCPM-SALA is a hybrid model that integrates the high-fidelity long-context modeling of sparse attention with the global efficiency of linear attention.<n>On a single NVIDIA A6000D GPU, the model achieves up to 3.5x the inference speed of the full-attention model at the sequence length of 256K tokens.
arXiv Detail & Related papers (2026-02-12T09:37:05Z) - Implementation of high-efficiency, lightweight residual spiking neural network processor based on field-programmable gate arrays [0.49806798459446283]
This work presents an efficient residual SNN accelerator that combines algorithm and hardware co-design to optimize inference energy efficiency.<n>The proposed processor achieves a classification accuracy of 87.11% on the CIFAR-10 dataset, with an inference time of 3.98 ms per image and an energy efficiency of 183.5 FPS/W.
arXiv Detail & Related papers (2025-12-09T02:08:46Z) - RISC-V Based TinyML Accelerator for Depthwise Separable Convolutions in Edge AI [1.1816942730023885]
This paper introduces a novel hardware accelerator architecture that utilizes a fused pixel-wise dataflow.<n>It computes a single output pixel to completion across all stages-expansion, depthwise convolution, and projection-by streaming data.<n>It achieves a speedup of up to 59.3x over the baseline software execution on the RISC-V core.
arXiv Detail & Related papers (2025-11-26T10:01:31Z) - Efficient Deployment of CNN Models on Multiple In-Memory Computing Units [0.0]
In-Memory Computing (IMC) represents a paradigm shift in deep learning acceleration.<n>We introduce the Load-Balance-Longest-Path (LBLP) algorithm for maximizing the processing rate and minimizing latency due to efficient resources utilization.
arXiv Detail & Related papers (2025-10-09T14:03:32Z) - Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation [129.45368843861917]
We introduce the Gated Memory Unit (GMU), a simple yet effective mechanism for efficient memory sharing across layers.<n>We apply it to create SambaY, a decoder-hybrid-decoder architecture that incorporates GMUs to share memory readout states from a Samba-based self-decoder.
arXiv Detail & Related papers (2025-07-09T07:27:00Z) - CIM-NET: A Video Denoising Deep Neural Network Model Optimized for Computing-in-Memory Architectures [4.1888033476195226]
CIM chips offer a promising solution by integrating within memory cells.<n>Existing DNN models are often designed without considering CIM architectural constraints.<n>We propose a hardware-algorithm co-design framework incorporating two innovations.
arXiv Detail & Related papers (2025-05-23T02:26:56Z) - An Event-Based Digital Compute-In-Memory Accelerator with Flexible Operand Resolution and Layer-Wise Weight/Output Stationarity [0.11522790873450185]
CIM accelerators for spiking neural networks (SNNs) are promising solutions to enable $mu$s-level inference latency and ultra-low energy in edge vision applications.
We propose a novel digital CIM macro that supports arbitrary operand resolution and shape, with a unified CIM storage for weights and membrane potentials.
Our approach can save up to 90% energy in large-scale systems, while reaching a state-of-the-art classification accuracy of 95.8% on the IBM DVS gesture dataset.
arXiv Detail & Related papers (2024-10-30T14:55:13Z) - EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE.
Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - EPIM: Efficient Processing-In-Memory Accelerators based on Epitome [78.79382890789607]
We introduce the Epitome, a lightweight neural operator offering convolution-like functionality.
On the software side, we evaluate epitomes' latency and energy on PIM accelerators.
We introduce a PIM-aware layer-wise design method to enhance their hardware efficiency.
arXiv Detail & Related papers (2023-11-12T17:56:39Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on
Edge [1.8293684411977293]
Deep Neural Network (DNN) based inference at the edge is challenging as these compute and data-intensive algorithms need to be implemented at low cost and low power.
We present RAMAN, a Re-configurable and spArse tinyML Accelerator for infereNce on edge, architected to exploit the sparsity to reduce area (storage), power as well as latency.
arXiv Detail & Related papers (2023-06-10T17:25:58Z) - Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and
Algorithm Co-design [66.39546326221176]
Attention-based neural networks have become pervasive in many AI tasks.
The use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources.
This paper proposes a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs.
arXiv Detail & Related papers (2022-09-20T09:28:26Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - MicroNet: Towards Image Recognition with Extremely Low FLOPs [117.96848315180407]
MicroNet is an efficient convolutional neural network using extremely low computational cost.
A family of MicroNets achieve a significant performance gain over the state-of-the-art in the low FLOP regime.
For instance, MicroNet-M1 achieves 61.1% top-1 accuracy on ImageNet classification with 12 MFLOPs, outperforming MobileNetV3 by 11.3%.
arXiv Detail & Related papers (2020-11-24T18:59:39Z) - MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with
Co-designed Compressed Neural Networks [0.6817102408452476]
Convolutional neural networks (CNNs) play a key role in deep learning applications.
CIM architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication.
To reduce computation costs, network pruning and quantization are two widely studied compression methods to shrink the model size.
arXiv Detail & Related papers (2020-10-24T10:31:49Z) - DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT
MCUs [6.403349961091506]
Low-Cost MCU-based end-nodes have limited on-chip memory and often replace caches with scratchpads.
DORY is an automatic tool to deploys on low cost MCUs with typically less than 1MB on-chip memory.
arXiv Detail & Related papers (2020-08-17T07:30:54Z) - Q-EEGNet: an Energy-Efficient 8-bit Quantized Parallel EEGNet
Implementation for Edge Motor-Imagery Brain--Machine Interfaces [16.381467082472515]
Motor-Imagery Brain--Machine Interfaces (MI-BMIs)promise direct and accessible communication between human brains and machines.
Deep learning models have emerged for classifying EEG signals.
These models often exceed the limitations of edge devices due to their memory and computational requirements.
arXiv Detail & Related papers (2020-04-24T12:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.