IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing
- URL: http://arxiv.org/abs/2507.07396v1
- Date: Thu, 10 Jul 2025 03:26:24 GMT
- Title: IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing
- Authors: Zeyang Song, Shimin Zhang, Yuhong Chou, Jibin Wu, Haizhou Li,
- Abstract summary: Spiking Neural Networks (SNNs) offer energy-efficient alternatives to traditional Artificial Neural Networks (ANNs)<n>IML-Spikeformer is a spiking Transformer architecture specifically designed for large-scale speech processing.<n>IML-Spikeformer achieves word error rates of 6.0% on AiShell-1 and 3.4% on Librispeech-960, comparable to conventional ANN transformers.
- Score: 37.95536541492917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spiking Neural Networks (SNNs), inspired by biological neural mechanisms, represent a promising neuromorphic computing paradigm that offers energy-efficient alternatives to traditional Artificial Neural Networks (ANNs). Despite proven effectiveness, SNN architectures have struggled to achieve competitive performance on large-scale speech processing task. Two key challenges hinder progress: (1) the high computational overhead during training caused by multi-timestep spike firing, and (2) the absence of large-scale SNN architectures tailored to speech processing tasks. To overcome the issues, we introduce Input-aware Multi-Level Spikeformer, i.e. IML-Spikeformer, a spiking Transformer architecture specifically designed for large-scale speech processing. Central to our design is the Input-aware Multi-Level Spike (IMLS) mechanism, which simulate multi-timestep spike firing within a single timestep using an adaptive, input-aware thresholding scheme. IML-Spikeformer further integrates a Reparameterized Spiking Self-Attention (RepSSA) module with a Hierarchical Decay Mask (HDM), forming the HD-RepSSA module. This module enhances the precision of attention maps and enables modeling of multi-scale temporal dependencies in speech signals. Experiments demonstrate that IML-Spikeformer achieves word error rates of 6.0\% on AiShell-1 and 3.4\% on Librispeech-960, comparable to conventional ANN transformers while reducing theoretical inference energy consumption by 4.64$\times$ and 4.32$\times$ respectively. IML-Spikeformer marks an advance of scalable SNN architectures for large-scale speech processing in both task performance and energy efficiency.
Related papers
- Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
Neuromorphic computing uses spiking neural networks (SNNs) to perform inference tasks.<n> embedding a small payload within each spike exchanged between spiking neurons can enhance inference accuracy without increasing energy consumption.<n> split computing - where an SNN is partitioned across two devices - is a promising solution.<n>This paper presents the first comprehensive study of a neuromorphic wireless split computing architecture that employs multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z) - Energy-Aware FPGA Implementation of Spiking Neural Network with LIF Neurons [0.5243460995467893]
Spiking Neural Networks (SNNs) stand out as a cutting-edge solution for TinyML.
This paper presents a novel SNN architecture based on the 1st Order Leaky Integrate-and-Fire (LIF) neuron model.
A hardware-friendly LIF design is also proposed, and implemented on a Xilinx Artix-7 FPGA.
arXiv Detail & Related papers (2024-11-03T16:42:10Z) - TMN: A Lightweight Neuron Model for Efficient Nonlinear Spike Representation [7.524721345903027]
Spike trains serve as the primary medium for information transmission in Spiking Neural Networks.<n>Existing encoding schemes based on spike counts or timing often face severe limitations under low-timestep constraints.<n>We propose the Ternary Momentum Neuron (TMN), a novel neuron model featuring two key innovations.
arXiv Detail & Related papers (2024-08-30T12:39:25Z) - SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking [43.275370104552344]
The human brain is much more energy-efficient than large language models with similar parameters.<n>We propose the first spiking large language model, SpikeLLM.<n>SpikeLLM reduces 11.01% WikiText2 perplexity and improves 2.55% accuracy of common scene reasoning.
arXiv Detail & Related papers (2024-07-05T08:37:17Z) - Language Modeling on a SpiNNaker 2 Neuromorphic Chip [2.760675104404914]
Event-based networks on neuromorphic devices offer a potential way to reduce energy consumption for inference significantly.
We demonstrate the first-ever implementation of a language model on a neuromorphic device.
arXiv Detail & Related papers (2023-12-14T16:16:35Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - TMS: A Temporal Multi-scale Backbone Design for Speaker Embedding [60.292702363839716]
Current SOTA backbone networks for speaker embedding are designed to aggregate multi-scale features from an utterance with multi-branch network architectures for speaker representation.
We propose an effective temporal multi-scale (TMS) model where multi-scale branches could be efficiently designed in a speaker embedding network almost without increasing computational costs.
arXiv Detail & Related papers (2022-03-17T05:49:35Z) - Mixed Precision Low-bit Quantization of Neural Network Language Models
for Speech Recognition [67.95996816744251]
State-of-the-art language models (LMs) represented by long-short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming increasingly complex and expensive for practical applications.
Current quantization methods are based on uniform precision and fail to account for the varying performance sensitivity at different parts of LMs to quantization errors.
Novel mixed precision neural network LM quantization methods are proposed in this paper.
arXiv Detail & Related papers (2021-11-29T12:24:02Z) - Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by
Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment.
We implement this algorithm in a real-time robotic system with a microphone array.
The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z) - A Spike in Performance: Training Hybrid-Spiking Neural Networks with
Quantized Activation Functions [6.574517227976925]
Spiking Neural Network (SNN) is a promising approach to energy-efficient computing.
We show how to maintain state-of-the-art accuracy when converting a non-spiking network into an SNN.
arXiv Detail & Related papers (2020-02-10T05:24:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.