Related papers: N-gram Boosting: Improving Contextual Biasing with Normalized N-gram Targets

N-gram Boosting: Improving Contextual Biasing with Normalized N-gram Targets

URL: http://arxiv.org/abs/2308.02092v1
Date: Fri, 4 Aug 2023 00:23:14 GMT
Title: N-gram Boosting: Improving Contextual Biasing with Normalized N-gram Targets
Authors: Wang Yau Li, Shreekantha Nadig, Karol Chang, Zafarullah Mahmood, Riqiang Wang, Simon Vandieken, Jonas Robertson, Fred Mailhot
Abstract summary: We present a two-step keyword boosting mechanism that works on normalized unigrams and n-grams rather than just single tokens. This improves our keyword recognition rate by 26% relative on our proprietary in-domain dataset and 2% on LibriSpeech.
Score: 1.9908600514057855
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Accurate transcription of proper names and technical terms is particularly important in speech-to-text applications for business conversations. These words, which are essential to understanding the conversation, are often rare and therefore likely to be under-represented in text and audio training data, creating a significant challenge in this domain. We present a two-step keyword boosting mechanism that successfully works on normalized unigrams and n-grams rather than just single tokens, which eliminates missing hits issues with boosting raw targets. In addition, we show how adjusting the boosting weight logic avoids over-boosting multi-token keywords. This improves our keyword recognition rate by 26% relative on our proprietary in-domain dataset and 2% on LibriSpeech. This method is particularly useful on targets that involve non-alphabetic characters or have non-standard pronunciations.

Related papers

Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical Mapping [85.48043537327258]
Contextual Dynamic Mapping (CDM) is a novel cross-tokenizer distillation framework. It uses contextual information to enhance sequence alignment precision and dynamically improve vocabulary mapping. Our method shows significant advantages over existing cross-tokenizer distillation baselines across diverse benchmarks.
arXiv Detail & Related papers (2025-02-16T12:46:07Z)
LM-assisted keyword biasing with Aho-Corasick algorithm for Transducer-based ASR [3.841280537264271]
We propose a light on-the-fly method to improve automatic speech recognition performance. We combine a bias list of named entities with a word-level n-gram language model with the shallow fusion approach based on the Aho-Corasick string matching algorithm. We achieve up to 21.6% relative improvement in the general word error rate with no practical difference in the inverse real-time factor.
arXiv Detail & Related papers (2024-09-20T13:53:37Z)
Open-vocabulary Keyword-spotting with Adaptive Instance Normalization [18.250276540068047]
We propose AdaKWS, a novel method for keyword spotting in which a text encoder is trained to output keyword-conditioned normalization parameters. We show significant improvements over recent keyword spotting and ASR baselines.
arXiv Detail & Related papers (2023-09-13T13:49:42Z)
Assessing the Importance of Frequency versus Compositionality for Subword-based Tokenization in NMT [7.600968522331612]
Subword tokenization is the de facto standard for tokenization in neural language models and machine translation systems. Three advantages are frequently cited in favor of subwords: shorter encoding of frequent tokens, compositionality of subwords, and ability to deal with unknown words. We propose a tokenization approach that enables us to separate frequency from compositionality.
arXiv Detail & Related papers (2023-06-02T09:39:36Z)
Automatic dense annotation of large-vocabulary sign language videos [85.61513254261523]
We propose a simple, scalable framework to vastly increase the density of automatic annotations. We make these annotations publicly available to support the sign language research community.
arXiv Detail & Related papers (2022-08-04T17:55:09Z)
NFLAT: Non-Flat-Lattice Transformer for Chinese Named Entity Recognition [39.308634515653914]
We advocate a novel lexical enhancement method, InterFormer, that effectively reduces the amount of computational and memory costs. Compared with FLAT, it reduces unnecessary attention calculations in "word-character" and "word-word" This reduces the memory usage by about 50% and can use more extensive lexicons or higher batches for network training.
arXiv Detail & Related papers (2022-05-12T01:55:37Z)
Short-Term Word-Learning in a Dynamically Changing Environment [63.025297637716534]
We show how to supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly. We demonstrate significant improvements in the detection rate of new words with only a minor increase in false alarms.
arXiv Detail & Related papers (2022-03-29T10:05:39Z)
Spell my name: keyword boosted speech recognition [25.931897154065663]
uncommon words such as names and technical terminology are important to understanding conversations in context. We propose a simple but powerful ASR decoding method that can better recognise these uncommon keywords. The method boosts the probabilities of given keywords in a beam search based on acoustic model predictions. We demonstrate the effectiveness of our method on the LibriSpeeech test sets and also internal data of real-world conversations.
arXiv Detail & Related papers (2021-10-06T14:16:57Z)
LadRa-Net: Locally-Aware Dynamic Re-read Attention Net for Sentence Semantic Matching [66.65398852962177]
We develop a novel Dynamic Re-read Network (DRr-Net) for sentence semantic matching. We extend DRr-Net to Locally-Aware Dynamic Re-read Attention Net (LadRa-Net) Experiments on two popular sentence semantic matching tasks demonstrate that DRr-Net can significantly improve the performance of sentence semantic matching.
arXiv Detail & Related papers (2021-08-06T02:07:04Z)
Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification [68.3291372168167]
We focus on incorporating external knowledge into the verbalizer, forming a knowledgeable prompt-tuning (KPT) We expand the label word space of the verbalizer using external knowledge bases (KBs) and refine the expanded label word space with the PLM itself before predicting with the expanded label word space. Experiments on zero and few-shot text classification tasks demonstrate the effectiveness of knowledgeable prompt-tuning.
arXiv Detail & Related papers (2021-08-04T13:00:16Z)
UCPhrase: Unsupervised Context-aware Quality Phrase Tagging [63.86606855524567]
UCPhrase is a novel unsupervised context-aware quality phrase tagger. We induce high-quality phrase spans as silver labels from consistently co-occurring word sequences. We show that our design is superior to state-of-the-art pre-trained, unsupervised, and distantly supervised methods.
arXiv Detail & Related papers (2021-05-28T19:44:24Z)
R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic Matching [58.72111690643359]
We propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching. We first employ BERT to encode the input sentences from a global perspective. Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective. To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task.
arXiv Detail & Related papers (2020-12-16T13:11:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.