Exploring Filterbank Learning for Keyword Spotting
- URL: http://arxiv.org/abs/2006.00217v1
- Date: Sat, 30 May 2020 08:11:58 GMT
- Title: Exploring Filterbank Learning for Keyword Spotting
- Authors: Iv\'an L\'opez-Espejo and Zheng-Hua Tan and Jesper Jensen
- Abstract summary: This paper explores filterbank learning for keyword spotting (KWS)
Two approaches are examined: filterbank matrix learning in the power spectral domain and parameter learning of a psychoacoustically-motivated gammachirp filterbank.
Our experimental results reveal that, in general, there are no statistically significant differences, in terms of KWS accuracy, between using a learned filterbank and handcrafted speech features.
- Score: 27.319236923928205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite their great performance over the years, handcrafted speech features
are not necessarily optimal for any particular speech application.
Consequently, with greater or lesser success, optimal filterbank learning has
been studied for different speech processing tasks. In this paper, we fill in a
gap by exploring filterbank learning for keyword spotting (KWS). Two approaches
are examined: filterbank matrix learning in the power spectral domain and
parameter learning of a psychoacoustically-motivated gammachirp filterbank.
Filterbank parameters are optimized jointly with a modern deep residual neural
network-based KWS back-end. Our experimental results reveal that, in general,
there are no statistically significant differences, in terms of KWS accuracy,
between using a learned filterbank and handcrafted speech features. Thus, while
we conclude that the latter are still a wise choice when using modern KWS
back-ends, we also hypothesize that this could be a symptom of information
redundancy, which opens up new research possibilities in the field of
small-footprint KWS.
Related papers
- On filter design in deep convolutional neural network [0.0]
The deep convolutional neural network (DCNN) in computer vision has given promising results.
Filters or weights are the critical elements responsible for learning in DCNN.
Various studies have been done in the last decade on semi-supervised, self-supervised, and unsupervised methods.
arXiv Detail & Related papers (2024-10-29T01:13:22Z) - Multitaper mel-spectrograms for keyword spotting [42.82842124247846]
This paper investigates the use of the multitaper technique to create improved features for KWS.
Experiment results confirm the advantages of using the proposed improved features.
arXiv Detail & Related papers (2024-07-05T17:18:25Z) - Filterbank Learning for Small-Footprint Keyword Spotting Robust to Noise [48.447830888836805]
Filterbank learning outperforms handcrafted speech features for KWS when the number of filterbank channels is severely decreased.
switching from typically used 40-channel log-Mel features to 8-channel learned features leads to a relative KWS accuracy loss of only 3.5%.
arXiv Detail & Related papers (2022-11-19T02:20:14Z) - Unrolled Compressed Blind-Deconvolution [77.88847247301682]
sparse multichannel blind deconvolution (S-MBD) arises frequently in many engineering applications such as radar/sonar/ultrasound imaging.
We propose a compression method that enables blind recovery from much fewer measurements with respect to the full received signal in time.
arXiv Detail & Related papers (2022-09-28T15:16:58Z) - Batch Normalization Tells You Which Filter is Important [49.903610684578716]
We propose a simple yet effective filter pruning method by evaluating the importance of each filter based on the BN parameters of pre-trained CNNs.
The experimental results on CIFAR-10 and ImageNet demonstrate that the proposed method can achieve outstanding performance.
arXiv Detail & Related papers (2021-12-02T12:04:59Z) - Learning Filterbanks for End-to-End Acoustic Beamforming [8.721077261941234]
Recent work on monaural source separation has shown that performance can be increased by using fully learned filterbanks with short windows.
On the other hand, for conventional beamforming techniques, performance increases with long analysis windows.
In this work we try to bridge the gap between these two worlds and explore fully end-to-end hybrid neural beamforming.
arXiv Detail & Related papers (2021-11-08T16:36:34Z) - Learning Versatile Convolution Filters for Efficient Visual Recognition [125.34595948003745]
This paper introduces versatile filters to construct efficient convolutional neural networks.
We conduct theoretical analysis on network complexity and an efficient convolution scheme is introduced.
Experimental results on benchmark datasets and neural networks demonstrate that our versatile filters are able to achieve comparable accuracy as that of original filters.
arXiv Detail & Related papers (2021-09-20T06:07:14Z) - Learning Sparse Analytic Filters for Piano Transcription [21.352141245632247]
Filterbank learning has become an increasingly popular strategy for various audio-related machine learning tasks.
In this work, several variations of a filterbank learning module are investigated for piano transcription.
arXiv Detail & Related papers (2021-08-23T19:41:11Z) - Training Interpretable Convolutional Neural Networks by Differentiating
Class-specific Filters [64.46270549587004]
Convolutional neural networks (CNNs) have been successfully used in a range of tasks.
CNNs are often viewed as "black-box" and lack of interpretability.
We propose a novel strategy to train interpretable CNNs by encouraging class-specific filters.
arXiv Detail & Related papers (2020-07-16T09:12:26Z) - Dependency Aware Filter Pruning [74.69495455411987]
Pruning a proportion of unimportant filters is an efficient way to mitigate the inference cost.
Previous work prunes filters according to their weight norms or the corresponding batch-norm scaling factors.
We propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity.
arXiv Detail & Related papers (2020-05-06T07:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.