HEiMDaL: Highly Efficient Method for Detection and Localization of
wake-words
- URL: http://arxiv.org/abs/2210.15425v1
- Date: Wed, 26 Oct 2022 17:26:57 GMT
- Title: HEiMDaL: Highly Efficient Method for Detection and Localization of
wake-words
- Authors: Arnav Kundu, Mohammad Samragh Razlighi, Minsik Cho, Priyanka
Padmanabhan, Devang Naik
- Abstract summary: Streaming keyword spotting is a widely used solution for activating voice assistants.
We propose an low footprint CNN model, called HEiMDaL, to detect and localize keywords in streaming conditions.
- Score: 8.518479417031775
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Streaming keyword spotting is a widely used solution for activating voice
assistants. Deep Neural Networks with Hidden Markov Model (DNN-HMM) based
methods have proven to be efficient and widely adopted in this space, primarily
because of the ability to detect and identify the start and end of the wake-up
word at low compute cost. However, such hybrid systems suffer from loss metric
mismatch when the DNN and HMM are trained independently. Sequence
discriminative training cannot fully mitigate the loss-metric mismatch due to
the inherent Markovian style of the operation. We propose an low footprint CNN
model, called HEiMDaL, to detect and localize keywords in streaming conditions.
We introduce an alignment-based classification loss to detect the occurrence of
the keyword along with an offset loss to predict the start of the keyword.
HEiMDaL shows 73% reduction in detection metrics along with equivalent
localization accuracy and with the same memory footprint as existing DNN-HMM
style models for a given wake-word.
Related papers
- Towards Faster k-Nearest-Neighbor Machine Translation [56.66038663128903]
k-nearest-neighbor machine translation approaches suffer from heavy retrieve overhead on the entire datastore when decoding each token.
We propose a simple yet effective multi-layer perceptron (MLP) network to predict whether a token should be translated jointly by the neural machine translation model and probabilities produced by the kNN.
arXiv Detail & Related papers (2023-12-12T16:41:29Z) - Performance evaluation of Machine learning algorithms for Intrusion Detection System [0.40964539027092917]
This paper focuses on intrusion detection systems (IDSs) analysis using Machine Learning (ML) techniques.
We analyze the KDD CUP-'99' intrusion detection dataset used for training and validating ML models.
arXiv Detail & Related papers (2023-10-01T06:35:37Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Real-time Speech Emotion Recognition Based on Syllable-Level Feature
Extraction [7.0019575386261375]
We present a speech emotion recognition system based on a reductionist approach of decomposing and analyzing syllable-level features.
A set of syllable-level formant features is extracted and fed into a single hidden layer neural network that makes predictions for each syllable.
Experiments show that the method archives real-time latency while predicting with state-of-the-art cross-corpus unweighted accuracy of 47.6% for IE to MI and 56.2% for MI to IE.
arXiv Detail & Related papers (2022-04-25T00:20:28Z) - DAAIN: Detection of Anomalous and Adversarial Input using Normalizing
Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA)
Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution.
Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z) - A Correspondence Variational Autoencoder for Unsupervised Acoustic Word
Embeddings [50.524054820564395]
We propose a new unsupervised model for mapping a variable-duration speech segment to a fixed-dimensional representation.
The resulting acoustic word embeddings can form the basis of search, discovery, and indexing systems for low- and zero-resource languages.
arXiv Detail & Related papers (2020-12-03T19:24:42Z) - Optimize what matters: Training DNN-HMM Keyword Spotting Model Using End
Metric [21.581361079189563]
Deep Neural Network--Hidden Markov Model (DNN-HMM) based methods have been successfully used for many always-on keyword spotting algorithms.
We present a novel end-to-end training strategy that learns the DNN parameters by optimizing for the detection score.
Our method does not require any change in the model architecture or the inference framework.
arXiv Detail & Related papers (2020-11-02T17:47:21Z) - Wake Word Detection with Alignment-Free Lattice-Free MMI [66.12175350462263]
Always-on spoken language interfaces, e.g. personal digital assistants, rely on a wake word to start processing spoken input.
We present novel methods to train a hybrid DNN/HMM wake word detection system from partially labeled training data.
We evaluate our methods on two real data sets, showing 50%--90% reduction in false rejection rates at pre-specified false alarm rates over the best previously published figures.
arXiv Detail & Related papers (2020-05-17T19:22:25Z) - Small-Footprint Open-Vocabulary Keyword Spotting with Quantized LSTM
Networks [3.8382752162527933]
In this paper, we focus on an open-vocabulary keyword spotting method, allowing the user to define their own keywords without having to retrain the whole model.
We describe the different design choices leading to a fast and small-footprint system, able to run on tiny devices, for any arbitrary set of user-defined keywords.
arXiv Detail & Related papers (2020-02-25T13:27:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.