Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks
- URL: http://arxiv.org/abs/2309.09469v2
- Date: Sat, 23 Mar 2024 04:41:23 GMT
- Title: Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks
- Authors: Zeyang Song, Jibin Wu, Malu Zhang, Mike Zheng Shou, Haizhou Li,
- Abstract summary: Spiking-LEAF is a learnable auditory front-end meticulously designed for SNN-based speech processing.
On keyword spotting and speaker identification tasks, the proposed Spiking-LEAF outperforms both SOTA spiking auditory front-ends.
- Score: 53.31894108974566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Brain-inspired spiking neural networks (SNNs) have demonstrated great potential for temporal signal processing. However, their performance in speech processing remains limited due to the lack of an effective auditory front-end. To address this limitation, we introduce Spiking-LEAF, a learnable auditory front-end meticulously designed for SNN-based speech processing. Spiking-LEAF combines a learnable filter bank with a novel two-compartment spiking neuron model called IHC-LIF. The IHC-LIF neurons draw inspiration from the structure of inner hair cells (IHC) and they leverage segregated dendritic and somatic compartments to effectively capture multi-scale temporal dynamics of speech signals. Additionally, the IHC-LIF neurons incorporate the lateral feedback mechanism along with spike regularization loss to enhance spike encoding efficiency. On keyword spotting and speaker identification tasks, the proposed Spiking-LEAF outperforms both SOTA spiking auditory front-ends and conventional real-valued acoustic features in terms of classification accuracy, noise robustness, and encoding efficiency.
Related papers
- DPSNN: Spiking Neural Network for Low-Latency Streaming Speech Enhancement [3.409728296852651]
Speech enhancement improves communication in noisy environments, affecting areas such as automatic speech recognition, hearing aids, and telecommunications.
Neuromorphic algorithms in the form of spiking neural networks (SNNs) have great potential.
We develop a two-phase time-domain streaming SNN framework -- the Dual-Path Spiking Neural Network (DPSNN)
arXiv Detail & Related papers (2024-08-14T09:08:43Z) - Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks.
We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.
Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z) - sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection
with Spiking Neural Networks [51.516451451719654]
Spiking Neural Networks (SNNs) are known to be biologically plausible and power-efficient.
This paper introduces a novel SNN-based Voice Activity Detection model, referred to as sVAD.
It provides effective auditory feature representation through SincNet and 1D convolution, and improves noise robustness with attention mechanisms.
arXiv Detail & Related papers (2024-03-09T02:55:44Z) - Inherent Redundancy in Spiking Neural Networks [24.114844269113746]
Spiking Networks (SNNs) are a promising energy-efficient alternative to conventional artificial neural networks.
In this work, we focus on three key questions regarding inherent redundancy in SNNs.
We propose an Advance Attention (ASA) module to harness SNNs' redundancy.
arXiv Detail & Related papers (2023-08-16T08:58:25Z) - Corticomorphic Hybrid CNN-SNN Architecture for EEG-based Low-footprint
Low-latency Auditory Attention Detection [8.549433398954738]
In a multi-speaker "cocktail party" scenario, a listener can selectively attend to a speaker of interest.
Current trends in EEG-based auditory attention detection using artificial neural networks (ANN) are not practical for edge-computing platforms.
We propose a hybrid convolutional neural network-spiking neural network (CNN-SNN) architecture, inspired by the auditory cortex.
arXiv Detail & Related papers (2023-07-13T20:33:39Z) - Exploiting High Performance Spiking Neural Networks with Efficient
Spiking Patterns [4.8416725611508244]
Spiking Neural Networks (SNNs) use discrete spike sequences to transmit information, which significantly mimics the information transmission of the brain.
This paper introduces the dynamic Burst pattern and designs the Leaky Integrate and Fire or Burst (LIFB) neuron that can make a trade-off between short-time performance and dynamic temporal performance.
arXiv Detail & Related papers (2023-01-29T04:22:07Z) - Surrogate Gradient Spiking Neural Networks as Encoders for Large
Vocabulary Continuous Speech Recognition [91.39701446828144]
We show that spiking neural networks can be trained like standard recurrent neural networks using the surrogate gradient method.
They have shown promising results on speech command recognition tasks.
In contrast to their recurrent non-spiking counterparts, they show robustness to exploding gradient problems without the need to use gates.
arXiv Detail & Related papers (2022-12-01T12:36:26Z) - MFA: TDNN with Multi-scale Frequency-channel Attention for
Text-independent Speaker Verification with Short Utterances [94.70787497137854]
We propose a multi-scale frequency-channel attention (MFA) to characterize speakers at different scales through a novel dual-path design which consists of a convolutional neural network and TDNN.
We evaluate the proposed MFA on the VoxCeleb database and observe that the proposed framework with MFA can achieve state-of-the-art performance while reducing parameters and complexity.
arXiv Detail & Related papers (2022-02-03T14:57:05Z) - HASA-net: A non-intrusive hearing-aid speech assessment network [52.83357278948373]
We propose a DNN-based hearing aid speech assessment network (HASA-Net) to predict speech quality and intelligibility scores simultaneously.
To the best of our knowledge, HASA-Net is the first work to incorporate quality and intelligibility assessments utilizing a unified DNN-based non-intrusive model for hearing aids.
Experimental results show that the predicted speech quality and intelligibility scores of HASA-Net are highly correlated to two well-known intrusive hearing-aid evaluation metrics.
arXiv Detail & Related papers (2021-11-10T14:10:13Z) - BackEISNN: A Deep Spiking Neural Network with Adaptive Self-Feedback and
Balanced Excitatory-Inhibitory Neurons [8.956708722109415]
Spiking neural networks (SNNs) transmit information through discrete spikes, which performs well in processing spatial-temporal information.
We propose a deep spiking neural network with adaptive self-feedback and balanced excitatory and inhibitory neurons (BackEISNN)
For the MNIST, FashionMNIST, and N-MNIST datasets, our model has achieved state-of-the-art performance.
arXiv Detail & Related papers (2021-05-27T08:38:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.