Related papers: sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks

sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks

URL: http://arxiv.org/abs/2403.05772v1
Date: Sat, 9 Mar 2024 02:55:44 GMT
Title: sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks
Authors: Qu Yang, Qianhui Liu, Nan Li, Meng Ge, Zeyang Song, Haizhou Li
Abstract summary: Spiking Neural Networks (SNNs) are known to be biologically plausible and power-efficient. This paper introduces a novel SNN-based Voice Activity Detection model, referred to as sVAD. It provides effective auditory feature representation through SincNet and 1D convolution, and improves noise robustness with attention mechanisms.
Score: 51.516451451719654
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Speech applications are expected to be low-power and robust under noisy conditions. An effective Voice Activity Detection (VAD) front-end lowers the computational need. Spiking Neural Networks (SNNs) are known to be biologically plausible and power-efficient. However, SNN-based VADs have yet to achieve noise robustness and often require large models for high performance. This paper introduces a novel SNN-based VAD model, referred to as sVAD, which features an auditory encoder with an SNN-based attention mechanism. Particularly, it provides effective auditory feature representation through SincNet and 1D convolution, and improves noise robustness with attention mechanisms. The classifier utilizes Spiking Recurrent Neural Networks (sRNN) to exploit temporal speech information. Experimental results demonstrate that our sVAD achieves remarkable noise robustness and meanwhile maintains low power consumption and a small footprint, making it a promising solution for real-world VAD applications.

Related papers

Spiking Meets Attention: Efficient Remote Sensing Image Super-Resolution with Attention Spiking Neural Networks [57.17129753411926]
Spiking neural networks (SNNs) are emerging as a promising alternative to traditional artificial neural networks (ANNs) We propose SpikeSR, which achieves state-of-the-art performance across various remote sensing benchmarks such as AID, DOTA, and DIOR.
arXiv Detail & Related papers (2025-03-06T09:06:06Z)
DPSNN: Spiking Neural Network for Low-Latency Streaming Speech Enhancement [3.409728296852651]
Speech enhancement improves communication in noisy environments, affecting areas such as automatic speech recognition, hearing aids, and telecommunications. Neuromorphic algorithms in the form of spiking neural networks (SNNs) have great potential. We develop a two-phase time-domain streaming SNN framework -- the Dual-Path Spiking Neural Network (DPSNN)
arXiv Detail & Related papers (2024-08-14T09:08:43Z)
A Real-Time Voice Activity Detection Based On Lightweight Neural [4.589472292598182]
Voice activity detection (VAD) is the task of detecting speech in an audio stream. Recent neural network-based VADs have alleviated the degradation of performance to some extent. We propose a lightweight and real-time neural network called MagicNet, which utilizes casual and depth separable 1-D convolutions and GRU.
arXiv Detail & Related papers (2024-05-27T03:31:16Z)
Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks [53.31894108974566]
Spiking-LEAF is a learnable auditory front-end meticulously designed for SNN-based speech processing. On keyword spotting and speaker identification tasks, the proposed Spiking-LEAF outperforms both SOTA spiking auditory front-ends.
arXiv Detail & Related papers (2023-09-18T04:03:05Z)
Single Channel Speech Enhancement Using U-Net Spiking Neural Networks [2.436681150766912]
Speech enhancement (SE) is crucial for reliable communication devices or robust speech recognition systems. We propose a novel approach to SE using a spiking neural network (SNN) based on a U-Net architecture. SNNs are suitable for processing data with a temporal dimension, such as speech, and are known for their energy-efficient implementation on neuromorphic hardware.
arXiv Detail & Related papers (2023-07-26T19:10:29Z)
A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning [57.28467469709369]
We investigate on designing a compact audio-visual wake word spotting (WWS) system by utilizing visual information. We introduce a neural network pruning strategy via the lottery ticket hypothesis in an iterative fine-tuning manner (LTH-IF) The proposed audio-visual system achieves significant performance improvements over the single-modality (audio-only or video-only) system under different noisy conditions.
arXiv Detail & Related papers (2022-02-17T08:26:25Z)
Event Based Time-Vectors for auditory features extraction: a neuromorphic approach for low power audio recognition [4.206844212918807]
We present a neuromorphic architecture, capable of unsupervised auditory feature recognition. We then validate the network on a subset of Google's Speech Commands dataset.
arXiv Detail & Related papers (2021-12-13T21:08:04Z)
HASA-net: A non-intrusive hearing-aid speech assessment network [52.83357278948373]
We propose a DNN-based hearing aid speech assessment network (HASA-Net) to predict speech quality and intelligibility scores simultaneously. To the best of our knowledge, HASA-Net is the first work to incorporate quality and intelligibility assessments utilizing a unified DNN-based non-intrusive model for hearing aids. Experimental results show that the predicted speech quality and intelligibility scores of HASA-Net are highly correlated to two well-known intrusive hearing-aid evaluation metrics.
arXiv Detail & Related papers (2021-11-10T14:10:13Z)
Robust Learning of Recurrent Neural Networks in Presence of Exogenous Noise [22.690064709532873]
We propose a tractable robustness analysis for RNN models subject to input noise. The robustness measure can be estimated efficiently using linearization techniques. Our proposed methodology significantly improves robustness of recurrent neural networks.
arXiv Detail & Related papers (2021-05-03T16:45:05Z)
Deep Networks for Direction-of-Arrival Estimation in Low SNR [89.45026632977456]
We introduce a Convolutional Neural Network (CNN) that is trained from mutli-channel data of the true array manifold matrix. We train a CNN in the low-SNR regime to predict DoAs across all SNRs. Our robust solution can be applied in several fields, ranging from wireless array sensors to acoustic microphones or sonars.
arXiv Detail & Related papers (2020-11-17T12:52:18Z)
Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions. Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks. This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.