Optimize what matters: Training DNN-HMM Keyword Spotting Model Using End
Metric
- URL: http://arxiv.org/abs/2011.01151v2
- Date: Fri, 26 Feb 2021 00:06:41 GMT
- Title: Optimize what matters: Training DNN-HMM Keyword Spotting Model Using End
Metric
- Authors: Ashish Shrivastava, Arnav Kundu, Chandra Dhir, Devang Naik, Oncel
Tuzel
- Abstract summary: Deep Neural Network--Hidden Markov Model (DNN-HMM) based methods have been successfully used for many always-on keyword spotting algorithms.
We present a novel end-to-end training strategy that learns the DNN parameters by optimizing for the detection score.
Our method does not require any change in the model architecture or the inference framework.
- Score: 21.581361079189563
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Network--Hidden Markov Model (DNN-HMM) based methods have been
successfully used for many always-on keyword spotting algorithms that detect a
wake word to trigger a device. The DNN predicts the state probabilities of a
given speech frame, while HMM decoder combines the DNN predictions of multiple
speech frames to compute the keyword detection score. The DNN, in prior
methods, is trained independent of the HMM parameters to minimize the
cross-entropy loss between the predicted and the ground-truth state
probabilities. The mis-match between the DNN training loss (cross-entropy) and
the end metric (detection score) is the main source of sub-optimal performance
for the keyword spotting task. We address this loss-metric mismatch with a
novel end-to-end training strategy that learns the DNN parameters by optimizing
for the detection score. To this end, we make the HMM decoder (dynamic
programming) differentiable and back-propagate through it to maximize the score
for the keyword and minimize the scores for non-keyword speech segments. Our
method does not require any change in the model architecture or the inference
framework; therefore, there is no overhead in run-time memory or compute
requirements. Moreover, we show significant reduction in false rejection rate
(FRR) at the same false trigger experience (> 70% over independent DNN
training).
Related papers
- Return of the RNN: Residual Recurrent Networks for Invertible Sentence
Embeddings [0.0]
This study presents a novel model for invertible sentence embeddings using a residual recurrent network trained on an unsupervised encoding task.
Rather than the probabilistic outputs common to neural machine translation models, our approach employs a regression-based output layer to reconstruct the input sequence's word vectors.
The model achieves high accuracy and fast training with the ADAM, a significant finding given that RNNs typically require memory units, such as LSTMs, or second-order optimization methods.
arXiv Detail & Related papers (2023-03-23T15:59:06Z) - HEiMDaL: Highly Efficient Method for Detection and Localization of
wake-words [8.518479417031775]
Streaming keyword spotting is a widely used solution for activating voice assistants.
We propose an low footprint CNN model, called HEiMDaL, to detect and localize keywords in streaming conditions.
arXiv Detail & Related papers (2022-10-26T17:26:57Z) - Automated machine learning for borehole resistivity measurements [0.0]
Deep neural networks (DNNs) offer a real-time solution for the inversion of borehole resistivity measurements.
It is possible to use extremely large DNNs to approximate the operators, but it demands a considerable training time.
In this work, we propose a scoring function that accounts for the accuracy and size of the DNNs.
arXiv Detail & Related papers (2022-07-20T12:27:22Z) - Adaptive Nearest Neighbor Machine Translation [60.97183408140499]
kNN-MT combines pre-trained neural machine translation with token-level k-nearest-neighbor retrieval.
Traditional kNN algorithm simply retrieves a same number of nearest neighbors for each target token.
We propose Adaptive kNN-MT to dynamically determine the number of k for each target token.
arXiv Detail & Related papers (2021-05-27T09:27:42Z) - On Addressing Practical Challenges for RNN-Transduce [72.72132048437751]
We adapt a well-trained RNN-T model to a new domain without collecting the audio data.
We obtain word-level confidence scores by utilizing several types of features calculated during decoding.
The proposed time stamping method can get less than 50ms word timing difference on average.
arXiv Detail & Related papers (2021-04-27T23:31:43Z) - A Correspondence Variational Autoencoder for Unsupervised Acoustic Word
Embeddings [50.524054820564395]
We propose a new unsupervised model for mapping a variable-duration speech segment to a fixed-dimensional representation.
The resulting acoustic word embeddings can form the basis of search, discovery, and indexing systems for low- and zero-resource languages.
arXiv Detail & Related papers (2020-12-03T19:24:42Z) - Deep Time Delay Neural Network for Speech Enhancement with Full Data
Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.
To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z) - Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks [61.76338096980383]
A range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper- parameters of state-of-the-art factored time delay neural networks (TDNNs)
These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training.
Experiments conducted on a 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems.
arXiv Detail & Related papers (2020-07-17T08:32:11Z) - Exploring Pre-training with Alignments for RNN Transducer based
End-to-End Speech Recognition [39.497407288772386]
recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research.
In this work, we leverage external alignments to seed the RNN-T model.
Two different pre-training solutions are explored, referred to as encoder pre-training, and whole-network pre-training respectively.
arXiv Detail & Related papers (2020-05-01T19:00:57Z) - GraN: An Efficient Gradient-Norm Based Detector for Adversarial and
Misclassified Examples [77.99182201815763]
Deep neural networks (DNNs) are vulnerable to adversarial examples and other data perturbations.
GraN is a time- and parameter-efficient method that is easily adaptable to any DNN.
GraN achieves state-of-the-art performance on numerous problem set-ups.
arXiv Detail & Related papers (2020-04-20T10:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.