Related papers: A Neural Model for Contextual Biasing Score Learning and Filtering

A Neural Model for Contextual Biasing Score Learning and Filtering

URL: http://arxiv.org/abs/2510.23849v1
Date: Mon, 27 Oct 2025 20:41:52 GMT
Title: A Neural Model for Contextual Biasing Score Learning and Filtering
Authors: Wanting Huang, Weiran Wang,
Abstract summary: We use an attention-based biasing decoder to produce scores for candidate phrases based on acoustic information extracted by an ASR encoder.<n>We introduce a per-token discriminative objective that encourages higher scores for ground-truth phrases while suppressing distractors.<n>Our approach is modular and can be used with any ASR system, and the filtering mechanism can potentially boost performance of other biasing methods.
Score: 11.862176451777286
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Contextual biasing improves automatic speech recognition (ASR) by integrating external knowledge, such as user-specific phrases or entities, during decoding. In this work, we use an attention-based biasing decoder to produce scores for candidate phrases based on acoustic information extracted by an ASR encoder, which can be used to filter out unlikely phrases and to calculate bonus for shallow-fusion biasing. We introduce a per-token discriminative objective that encourages higher scores for ground-truth phrases while suppressing distractors. Experiments on the Librispeech biasing benchmark show that our method effectively filters out majority of the candidate phrases, and significantly improves recognition accuracy under different biasing conditions when the scores are used in shallow fusion biasing. Our approach is modular and can be used with any ASR system, and the filtering mechanism can potentially boost performance of other biasing methods.

Related papers

Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning [55.41070713054046]
We develop the temporal-similarity score by introducing the unbiased sliced Wasserstein RBF kernel.<n>We also introduce an audio captioning framework based on the unbiased sliced Wasserstein kernel.
arXiv Detail & Related papers (2025-02-08T03:47:06Z)
Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search [44.94458898538114]
This paper proposes an attention-based contextual biasing method that can be customized using an editable phrase list. The proposed method can be trained effectively by combining a bias phrase index loss and special tokens to detect the bias phrases in the input speech data.
arXiv Detail & Related papers (2024-01-19T01:36:07Z)
Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization [66.22007368434633]
We present a first large-scale public synthetic dataset for contextual spellchecking customization of automatic speech recognition (ASR) The proposed approach allows creating millions of realistic examples of corrupted ASR hypotheses and simulate non-trivial biasing lists for the customization task. We report experiments with training an open-source customization model on the proposed dataset and show that the injection of hard negative biasing phrases decreases WER and the number of false alarms.
arXiv Detail & Related papers (2023-09-29T14:18:59Z)
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses. LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
Robust Acoustic and Semantic Contextual Biasing in Neural Transducers for Speech Recognition [14.744220870243932]
We propose to use lightweight character representations to encode fine-grained pronunciation features to improve contextual biasing. We further integrate pretrained neural language model (NLM) based encoders to encode the utterance's semantic context. Experiments using a Conformer Transducer model on the Librispeech dataset show a 4.62% - 9.26% relative WER improvement on different biasing list sizes.
arXiv Detail & Related papers (2023-05-09T08:51:44Z)
Power of Explanations: Towards automatic debiasing in hate speech detection [19.26084350822197]
Hate speech detection is a common downstream application of natural language processing (NLP) in the real world. We propose an automatic misuse detector (MiD) relying on an explanation method for detecting potential bias.
arXiv Detail & Related papers (2022-09-07T14:14:03Z)
Filter-based Discriminative Autoencoders for Children Speech Recognition [25.279902171523233]
We propose a filter-based discriminative autoencoder for acoustic modeling. In the training phase, the decoder uses the auxiliary information and the phonetic embedding extracted by the encoder. The framework can make the phonetic embedding purer, resulting in more accurate senone (triphone-state) scores.
arXiv Detail & Related papers (2022-04-01T02:18:57Z)
End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system [61.148549738631814]
End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model. Although it simplifies ASR system, it introduces contextual ASR drawback: the E2E model has worse performance on utterances containing infrequent proper nouns. We propose to add a contextual bias attention (CBA) module to attention based encoder decoder (AED) model to improve its ability of recognizing the contextual phrases.
arXiv Detail & Related papers (2022-02-18T03:26:02Z)
Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference [150.07326223077405]
Few-shot learning is attracting much attention to mitigate data scarcity. We present a discriminative nearest neighbor classification with deep self-attention. We propose to boost the discriminative ability by transferring a natural language inference (NLI) model.
arXiv Detail & Related papers (2020-10-25T00:39:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.