Filterbank Learning for Small-Footprint Keyword Spotting Robust to Noise
- URL: http://arxiv.org/abs/2211.10565v1
- Date: Sat, 19 Nov 2022 02:20:14 GMT
- Title: Filterbank Learning for Small-Footprint Keyword Spotting Robust to Noise
- Authors: Iv\'an L\'opez-Espejo and Ram C. M. C. Shekar and Zheng-Hua Tan and
Jesper Jensen and John H. L. Hansen
- Abstract summary: Filterbank learning outperforms handcrafted speech features for KWS when the number of filterbank channels is severely decreased.
switching from typically used 40-channel log-Mel features to 8-channel learned features leads to a relative KWS accuracy loss of only 3.5%.
- Score: 48.447830888836805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the context of keyword spotting (KWS), the replacement of handcrafted
speech features by learnable features has not yielded superior KWS performance.
In this study, we demonstrate that filterbank learning outperforms handcrafted
speech features for KWS whenever the number of filterbank channels is severely
decreased. Reducing the number of channels might yield certain KWS performance
drop, but also a substantial energy consumption reduction, which is key when
deploying common always-on KWS on low-resource devices. Experimental results on
a noisy version of the Google Speech Commands Dataset show that filterbank
learning adapts to noise characteristics to provide a higher degree of
robustness to noise, especially when dropout is integrated. Thus, switching
from typically used 40-channel log-Mel features to 8-channel learned features
leads to a relative KWS accuracy loss of only 3.5% while simultaneously
achieving a 6.3x energy consumption reduction.
Related papers
- Disentangled Training with Adversarial Examples For Robust Small-footprint Keyword Spotting [18.456711824241978]
We propose datasource-aware disentangled learning with adversarial examples to improve KWS robustness.
Experimental results demonstrate that the proposed learning strategy improves false reject rate by $40.31%$ at $1%$ false accept rate.
Our best-performing system achieves $98.06%$ accuracy on the Google Speech Commands V1 dataset.
arXiv Detail & Related papers (2024-08-23T20:03:51Z) - SparseVSR: Lightweight and Noise Robust Visual Speech Recognition [100.43280310123784]
We generate a lightweight model that achieves higher performance than its dense model equivalent.
Our results confirm that sparse networks are more resistant to noise than dense networks.
arXiv Detail & Related papers (2023-07-10T13:34:13Z) - Audio-Visual Efficient Conformer for Robust Speech Recognition [91.3755431537592]
We propose to improve the noise of the recently proposed Efficient Conformer Connectionist Temporal Classification architecture by processing both audio and visual modalities.
Our experiments show that using audio and visual modalities allows to better recognize speech in the presence of environmental noise and significantly accelerate training, reaching lower WER with 4 times less training steps.
arXiv Detail & Related papers (2023-01-04T05:36:56Z) - Simple Pooling Front-ends For Efficient Audio Classification [56.59107110017436]
We show that eliminating the temporal redundancy in the input audio features could be an effective approach for efficient audio classification.
We propose a family of simple pooling front-ends (SimPFs) which use simple non-parametric pooling operations to reduce the redundant information.
SimPFs can achieve a reduction in more than half of the number of floating point operations for off-the-shelf audio neural networks.
arXiv Detail & Related papers (2022-10-03T14:00:41Z) - Spiking Cochlea with System-level Local Automatic Gain Control [13.532394494130468]
We present an alternative system-level algorithm that implements channel-specific automatic gain control (AGC) in a silicon spiking cochlea.
Because this AGC mechanism only needs counting and adding operations, it can be implemented at low hardware cost in a future design.
We evaluate the impact of the local AGC algorithm on a classification task where the input signal varies over 32 dB input range.
arXiv Detail & Related papers (2022-02-14T13:58:13Z) - Weight, Block or Unit? Exploring Sparsity Tradeoffs for Speech
Enhancement on Tiny Neural Accelerators [4.1070979067056745]
We explore network sparsification strategies with the aim of compressing neural speech enhancement (SE) down to an optimal configuration for a new generation of low power microcontroller based neural accelerators (microNPU's)
We examine three unique sparsity structures: weight pruning, block pruning and unit pruning; and discuss their benefits and drawbacks when applied to SE.
arXiv Detail & Related papers (2021-11-03T17:06:36Z) - CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile
Application [63.2243126704342]
This study presents a deep learning-based speech signal-processing mobile application known as CITISEN.
The CITISEN provides three functions: speech enhancement (SE), model adaptation (MA), and background noise conversion (BNC)
Compared with the noisy speech signals, the enhanced speech signals achieved about 6% and 33% of improvements.
arXiv Detail & Related papers (2020-08-21T02:04:12Z) - Neural Network Virtual Sensors for Fuel Injection Quantities with
Provable Performance Specifications [71.1911136637719]
We show how provable guarantees can be naturally applied to other real world settings.
We show how specific intervals of fuel injection quantities can be targeted to maximize robustness for certain ranges.
arXiv Detail & Related papers (2020-06-30T23:33:17Z) - Exploring Filterbank Learning for Keyword Spotting [27.319236923928205]
This paper explores filterbank learning for keyword spotting (KWS)
Two approaches are examined: filterbank matrix learning in the power spectral domain and parameter learning of a psychoacoustically-motivated gammachirp filterbank.
Our experimental results reveal that, in general, there are no statistically significant differences, in terms of KWS accuracy, between using a learned filterbank and handcrafted speech features.
arXiv Detail & Related papers (2020-05-30T08:11:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.