GraphemeAug: A Systematic Approach to Synthesized Hard Negative Keyword Spotting Examples
- URL: http://arxiv.org/abs/2505.14814v2
- Date: Sun, 25 May 2025 03:06:52 GMT
- Title: GraphemeAug: A Systematic Approach to Synthesized Hard Negative Keyword Spotting Examples
- Authors: Harry Zhang, Kurt Partridge, Pai Zhu, Neng Chen, Hyun Jin Park, Dhruuv Agarwal, Quan Wang,
- Abstract summary: Spoken Keyword Spotting (KWS) is the task of distinguishing between the presence and absence of a keyword in audio.<n>We propose a method to generate adversarial examples close to the decision boundary by making insertion/deletion/substitution edits on the keyword's graphemes.<n>We show that the technique improves AUC on a dataset of synthetic hard negatives by 61% while maintaining quality on positives and ambient negative audio data.
- Score: 9.34501666048989
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spoken Keyword Spotting (KWS) is the task of distinguishing between the presence and absence of a keyword in audio. The accuracy of a KWS model hinges on its ability to correctly classify examples close to the keyword and non-keyword boundary. These boundary examples are often scarce in training data, limiting model performance. In this paper, we propose a method to systematically generate adversarial examples close to the decision boundary by making insertion/deletion/substitution edits on the keyword's graphemes. We evaluate this technique on held-out data for a popular keyword and show that the technique improves AUC on a dataset of synthetic hard negatives by 61% while maintaining quality on positives and ambient negative audio data.
Related papers
- WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing [5.50485371072671]
We propose a method to improve recognition accuracy of rare words in CTC-based models without additional training or text-to-speech systems.<n>For keyword detection, we adopt a wildcard CTC that is both fast and tolerant of ambiguous matches.<n>In experiments on Japanese speech recognition, the proposed method achieved a 29% improvement in the F1 score for unknown words.
arXiv Detail & Related papers (2025-06-02T02:30:26Z) - CAST: Corpus-Aware Self-similarity Enhanced Topic modelling [16.562349140796115]
We introduce CAST: Corpus-Aware Self-similarity Enhanced Topic modelling, a novel topic modelling method.<n>We find self-similarity to be an effective metric to prevent functional words from acting as candidate topic words.<n>Our approach significantly enhances the coherence and diversity of generated topics, as well as the topic model's ability to handle noisy data.
arXiv Detail & Related papers (2024-10-19T15:27:11Z) - Open-vocabulary Keyword-spotting with Adaptive Instance Normalization [18.250276540068047]
We propose AdaKWS, a novel method for keyword spotting in which a text encoder is trained to output keyword-conditioned normalization parameters.
We show significant improvements over recent keyword spotting and ASR baselines.
arXiv Detail & Related papers (2023-09-13T13:49:42Z) - Automatic Counterfactual Augmentation for Robust Text Classification
Based on Word-Group Search [12.894936637198471]
In general, a keyword is regarded as a shortcut if it creates a superficial association with the label, resulting in a false prediction.
We propose a new Word-Group mining approach, which captures the causal effect of any keyword combination and orders the combinations that most affect the prediction.
Our approach bases on effective post-hoc analysis and beam search, which ensures the mining effect and reduces the complexity.
arXiv Detail & Related papers (2023-07-01T02:26:34Z) - Towards preserving word order importance through Forced Invalidation [80.33036864442182]
We show that pre-trained language models are insensitive to word order.
We propose Forced Invalidation to help preserve the importance of word order.
Our experiments demonstrate that Forced Invalidation significantly improves the sensitivity of the models to word order.
arXiv Detail & Related papers (2023-04-11T13:42:10Z) - To Wake-up or Not to Wake-up: Reducing Keyword False Alarm by Successive
Refinement [58.96644066571205]
We show that existing deep keyword spotting mechanisms can be improved by Successive Refinement.
We show across multiple models with size ranging from 13K parameters to 2.41M parameters, the successive refinement technique reduces FA by up to a factor of 8.
Our proposed approach is "plug-and-play" and can be applied to any deep keyword spotting model.
arXiv Detail & Related papers (2023-04-06T23:49:29Z) - M-Tuning: Prompt Tuning with Mitigated Label Bias in Open-Set Scenarios [58.617025733655005]
We propose a vision-language prompt tuning method with mitigated label bias (M-Tuning)<n>It introduces open words from the WordNet to extend the range of words forming the prompt texts from only closed-set label words to more, and thus prompts are tuned in a simulated open-set scenario.<n>Our method achieves the best performance on datasets with various scales, and extensive ablation studies also validate its effectiveness.
arXiv Detail & Related papers (2023-03-09T09:05:47Z) - Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting [23.627625026135505]
We propose a novel end-to-end user-defined keyword spotting method.
Our method compares input queries with an enrolled text keyword sequence.
We introduce the LibriPhrase dataset for efficiently training keyword spotting models.
arXiv Detail & Related papers (2022-06-30T16:40:31Z) - Keywords and Instances: A Hierarchical Contrastive Learning Framework Unifying Hybrid Granularities for Text Generation [60.62039705180484]
We propose a hierarchical contrastive learning mechanism, which can unify hybrid granularities semantic meaning in the input text.<n> Experiments demonstrate that our model outperforms competitive baselines on paraphrasing, dialogue generation, and storytelling tasks.
arXiv Detail & Related papers (2022-05-26T13:26:03Z) - Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.
Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z) - MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction.
MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context.
We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.