A 71.2-$μ$W Speech Recognition Accelerator with Recurrent Spiking Neural Network
- URL: http://arxiv.org/abs/2503.21337v1
- Date: Thu, 27 Mar 2025 10:14:00 GMT
- Title: A 71.2-$μ$W Speech Recognition Accelerator with Recurrent Spiking Neural Network
- Authors: Chih-Chyau Yang, Tian-Sheuan Chang,
- Abstract summary: We propose a compact recurrent spiking neural network with two recurrent layers, one fully connected layer, and a low time step.<n>The 2.79-MB model undergoes pruning and 4-bit fixed-point quantization, shrinking it by 96.42% to 0.1 MB.<n>The design operates in real time at 100 kHz, consuming 71.2 $mu$W, surpassing state-of-the-art designs.
- Score: 0.0502254944841629
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper introduces a 71.2-$\mu$W speech recognition accelerator designed for edge devices' real-time applications, emphasizing an ultra low power design. Achieved through algorithm and hardware co-optimizations, we propose a compact recurrent spiking neural network with two recurrent layers, one fully connected layer, and a low time step (1 or 2). The 2.79-MB model undergoes pruning and 4-bit fixed-point quantization, shrinking it by 96.42\% to 0.1 MB. On the hardware front, we take advantage of \textit{mixed-level pruning}, \textit{zero-skipping} and \textit{merged spike} techniques, reducing complexity by 90.49\% to 13.86 MMAC/S. The \textit{parallel time-step execution} addresses inter-time-step data dependencies and enables weight buffer power savings through weight sharing. Capitalizing on the sparse spike activity, an input broadcasting scheme eliminates zero computations, further saving power. Implemented on the TSMC 28-nm process, the design operates in real time at 100 kHz, consuming 71.2 $\mu$W, surpassing state-of-the-art designs. At 500 MHz, it has 28.41 TOPS/W and 1903.11 GOPS/mm$^2$ in energy and area efficiency, respectively.
Related papers
- TeLLMe: An Energy-Efficient Ternary LLM Accelerator for Prefilling and Decoding on Edge FPGAs [5.889337608109388]
TeLLMe is the first ternary LLM accelerator for low-power FPGAs.
It supports both prefill and autoregressive decoding using 1.58-bit weights and 8-bit activations.
Under a 7W power budget, TeLLMe delivers up to 9 tokens/s throughput over 1,024-token contexts.
arXiv Detail & Related papers (2025-04-22T21:00:58Z) - ESSR: An 8K@30FPS Super-Resolution Accelerator With Edge Selective Network [0.0502254944841629]
This paper introduces an 8K@30FPS accelerator with edge-selective dynamic processing.<n>The implementation, using the TSMC 28nm process, can achieve 8K@30FPS at 800MHz with a gate count of 2749K, 0.2075W power consumption, and 4797Mpixels/J energy efficiency.
arXiv Detail & Related papers (2025-03-26T05:27:23Z) - DeltaKWS: A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM [16.1102923955667]
This paper introduces the first $Delta$RNN-enabled fine-grained temporal sparsity-aware KWS IC for voice-controlled devices.
At 87% temporal sparsity, computing latency and energy/ferencein are reduced by 2.4X/3.4X, respectively.
arXiv Detail & Related papers (2024-05-06T23:41:02Z) - Quantized Neural Networks for Low-Precision Accumulation with Guaranteed
Overflow Avoidance [68.8204255655161]
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference.
We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline.
arXiv Detail & Related papers (2023-01-31T02:46:57Z) - RAMP: A Flat Nanosecond Optical Network and MPI Operations for
Distributed Deep Learning Systems [68.8204255655161]
We introduce a near-exascale, full-bisection bandwidth, all-to-all, single-hop, all-optical network architecture with nanosecond reconfiguration called RAMP.
RAMP supports large-scale distributed and parallel computing systems (12.8Tbps per node for up to 65,536 nodes.
arXiv Detail & Related papers (2022-11-28T11:24:51Z) - Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token
Migration [138.24994198567794]
iTPN is born with two elaborated designs: 1) The first pre-trained feature pyramid upon vision transformer (ViT)
Fast-iTPN can accelerate the inference procedure by up to 70%, with negligible performance loss.
arXiv Detail & Related papers (2022-11-23T06:56:12Z) - Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and
Algorithm Co-design [66.39546326221176]
Attention-based neural networks have become pervasive in many AI tasks.
The use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources.
This paper proposes a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs.
arXiv Detail & Related papers (2022-09-20T09:28:26Z) - Federated Learning for Energy-limited Wireless Networks: A Partial Model
Aggregation Approach [79.59560136273917]
limited communication resources, bandwidth and energy, and data heterogeneity across devices are main bottlenecks for federated learning (FL)
We first devise a novel FL framework with partial model aggregation (PMA)
The proposed PMA-FL improves 2.72% and 11.6% accuracy on two typical heterogeneous datasets.
arXiv Detail & Related papers (2022-04-20T19:09:52Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - Sound Event Detection with Binary Neural Networks on Tightly
Power-Constrained IoT Devices [20.349809458335532]
Sound event detection (SED) is a hot topic in consumer and smart city applications.
Existing approaches based on Deep Neural Networks are very effective, but highly demanding in terms of memory, power, and throughput.
In this paper, we explore the combination of extreme quantization to a small-print binary neural network (BNN) with the highly energy-efficient, RISC-V-based (8+1)-core GAP8 microcontroller.
arXiv Detail & Related papers (2021-01-12T12:38:23Z) - TinyRadarNN: Combining Spatial and Temporal Convolutional Neural
Networks for Embedded Gesture Recognition with Short Range Radars [13.266626571886354]
This work proposes a low-power high-accuracy embedded hand-gesture recognition algorithm targeting battery-operated wearable devices.
A 2D Convolutional Neural Network (CNN) using range frequency Doppler features is combined with a Temporal Convolutional Neural Network (TCN) for time sequence prediction.
arXiv Detail & Related papers (2020-06-25T15:23:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.