Wake Word Detection with Alignment-Free Lattice-Free MMI
- URL: http://arxiv.org/abs/2005.08347v3
- Date: Tue, 28 Jul 2020 22:06:14 GMT
- Title: Wake Word Detection with Alignment-Free Lattice-Free MMI
- Authors: Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur
- Abstract summary: Always-on spoken language interfaces, e.g. personal digital assistants, rely on a wake word to start processing spoken input.
We present novel methods to train a hybrid DNN/HMM wake word detection system from partially labeled training data.
We evaluate our methods on two real data sets, showing 50%--90% reduction in false rejection rates at pre-specified false alarm rates over the best previously published figures.
- Score: 66.12175350462263
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Always-on spoken language interfaces, e.g. personal digital assistants, rely
on a wake word to start processing spoken input. We present novel methods to
train a hybrid DNN/HMM wake word detection system from partially labeled
training data, and to use it in on-line applications: (i) we remove the
prerequisite of frame-level alignments in the LF-MMI training algorithm,
permitting the use of un-transcribed training examples that are annotated only
for the presence/absence of the wake word; (ii) we show that the classical
keyword/filler model must be supplemented with an explicit non-speech (silence)
model for good performance; (iii) we present an FST-based decoder to perform
online detection. We evaluate our methods on two real data sets, showing
50%--90% reduction in false rejection rates at pre-specified false alarm rates
over the best previously published figures, and re-validate them on a third
(large) data set.
Related papers
- Gibberish is All You Need for Membership Inference Detection in Contrastive Language-Audio Pretraining [3.7144455366570055]
Existing MIAs need audio as input, risking exposure of voiceprint and requiring costly shadow models.
We first propose PRMID, a membership inference detector based probability ranking given by CLAP, which does not require training shadow models.
We then propose USMID, a textual unimodal speaker-level membership inference detector, querying the target model using only text data.
arXiv Detail & Related papers (2024-10-24T02:26:57Z) - Detecting Pretraining Data from Large Language Models [90.12037980837738]
We study the pretraining data detection problem.
Given a piece of text and black-box access to an LLM without knowing the pretraining data, can we determine if the model was trained on the provided text?
We introduce a new detection method Min-K% Prob based on a simple hypothesis.
arXiv Detail & Related papers (2023-10-25T17:21:23Z) - Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft
Prompting and Calibrated Confidence Estimation [56.57532238195446]
We propose a method named Ethicist for targeted training data extraction.
To elicit memorization, we tune soft prompt embeddings while keeping the model fixed.
We show that Ethicist significantly improves the extraction performance on a recently proposed public benchmark.
arXiv Detail & Related papers (2023-07-10T08:03:41Z) - Three ways to improve feature alignment for open vocabulary detection [88.65076922242184]
Key problem in zero-shot open vocabulary detection is how to align visual and text features, so that the detector performs well on unseen classes.
Previous approaches train the feature pyramid and detection head from scratch, which breaks the vision-text feature alignment established during pretraining.
We propose three methods to alleviate these issues. Firstly, a simple scheme is used to augment the text embeddings which prevents overfitting to a small number of classes seen during training.
Secondly, the feature pyramid network and the detection head are modified to include trainable shortcuts.
Finally, a self-training approach is used to leverage a larger corpus of
arXiv Detail & Related papers (2023-03-23T17:59:53Z) - M-Tuning: Prompt Tuning with Mitigated Label Bias in Open-Set Scenarios [103.6153593636399]
We propose a vision-language prompt tuning method with mitigated label bias (M-Tuning)
It introduces open words from the WordNet to extend the range of words forming the prompt texts from only closed-set label words to more, and thus prompts are tuned in a simulated open-set scenario.
Our method achieves the best performance on datasets with various scales, and extensive ablation studies also validate its effectiveness.
arXiv Detail & Related papers (2023-03-09T09:05:47Z) - HEiMDaL: Highly Efficient Method for Detection and Localization of
wake-words [8.518479417031775]
Streaming keyword spotting is a widely used solution for activating voice assistants.
We propose an low footprint CNN model, called HEiMDaL, to detect and localize keywords in streaming conditions.
arXiv Detail & Related papers (2022-10-26T17:26:57Z) - Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
Languages [58.43299730989809]
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.
We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task.
This process stands on its own, or can be applied as low-cost second-stage pre-training.
arXiv Detail & Related papers (2022-05-02T17:59:02Z) - Device-Directed Speech Detection: Regularization via Distillation for
Weakly-Supervised Models [13.456066434598155]
We address the problem of detecting speech directed to a device that does not contain a specific wake-word.
Specifically, we focus on audio coming from a touch-based invocation.
arXiv Detail & Related papers (2022-03-30T01:27:39Z) - Semi-Supervised Speech Recognition via Local Prior Matching [42.311823406287864]
Local prior matching is a semi-supervised objective that distills knowledge from a strong prior.
We demonstrate that LPM is theoretically well-native, simple to implement, and superior to existing knowledge distillation techniques.
arXiv Detail & Related papers (2020-02-24T16:07:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.