Related papers: Peeking Into The Future For Contextual Biasing

Peeking Into The Future For Contextual Biasing

URL: http://arxiv.org/abs/2512.17657v1
Date: Fri, 19 Dec 2025 14:56:28 GMT
Title: Peeking Into The Future For Contextual Biasing
Authors: Ramaneswaran Selvakumar, Cindy Tseng, Eesung Kim, Vijendra Raj Apsingekar, Yun Tang,
Abstract summary: We propose a contextual biasing method for attention based encoder decoder (AED) models.<n>We simultaneously predict multiple future tokens, enabling the model to "peek into the future" and score potential candidate entities.<n>Our approach achieves up to 50.34% relative improvement in named entity word error rate compared to the baseline AED model.
Score: 8.769657210925777
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While end-to-end (E2E) automatic speech recognition (ASR) models excel at general transcription, they struggle to recognize rare or unseen named entities (e.g., contact names, locations), which are critical for downstream applications like virtual assistants. In this paper, we propose a contextual biasing method for attention based encoder decoder (AED) models using a list of candidate named entities. Instead of predicting only the next token, we simultaneously predict multiple future tokens, enabling the model to "peek into the future" and score potential candidate entities in the entity list. Moreover, our approach leverages the multi-token prediction logits directly without requiring additional entity encoders or cross-attention layers, significantly reducing architectural complexity. Experiments on Librispeech demonstrate that our approach achieves up to 50.34% relative improvement in named entity word error rate compared to the baseline AED model.

Related papers

Continuous Autoregressive Language Models [56.49239051750678]
We introduce Continuous Autoregressive Language Models (CALM)<n>CALM uses a high-fidelity autoencoder to compress a chunk of K tokens into a single continuous vector.<n>We develop a comprehensive likelihood-free framework that enables robust training, evaluation, and controllable sampling.
arXiv Detail & Related papers (2025-10-31T17:58:11Z)
Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE [15.003006630308517]
Speculative decoding (SD) accelerates large language model inference by using a smaller draft model to predict multiple tokens.<n>We propose Jakiro, leveraging Mixture of Experts (MoE), where independent experts generate diverse predictions.<n>Our method significantly boosts prediction accuracy and achieves higher inference speedups.
arXiv Detail & Related papers (2025-02-10T09:24:06Z)
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass. In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z)
Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview [26.823126615724888]
End-to-end (E2E) approach is gradually replacing hybrid models for automatic speech recognition (ASR) tasks. We propose a novel approach, post-decoder biasing, which constructs a transform probability matrix based on the distribution of training transcriptions. In our experiments, for subsets of rare words appearing in the training speech between 10 and 20 times, the proposed method achieves a relative improvement of 9.3% and 5.1%, respectively.
arXiv Detail & Related papers (2024-03-01T08:53:52Z)
Multi-Candidate Speculative Decoding [82.05519287513444]
Large language models have shown impressive capabilities across a variety of NLP tasks, yet their generating text autoregressively is time-consuming. One way to speed them up is speculative decoding, which generates candidate segments from a fast draft model that is then verified in parallel by the target model. This paper proposes sampling multiple candidates from a draft model and then organising them in batches for verification. We design algorithms for efficient multi-candidate verification while maintaining the distribution of the target model.
arXiv Detail & Related papers (2024-01-12T17:15:23Z)
Object Recognition as Next Token Prediction [99.40793702627396]
We present an approach to pose object recognition as next token prediction. The idea is to apply a language decoder that auto-regressively predicts the text tokens from image embeddings to form labels.
arXiv Detail & Related papers (2023-12-04T18:58:40Z)
Offline Detection of Misspelled Handwritten Words by Convolving Recognition Model Features with Text Labels [0.0]
We introduce the task of comparing a handwriting image to text. Our model's classification head is trained entirely on synthetic data created using a state-of-the-art generative adversarial network. Such massive performance gains can lead to significant productivity increases in applications utilizing human-in-the-loop automation.
arXiv Detail & Related papers (2023-09-18T21:13:42Z)
Personalization of CTC Speech Recognition Models [15.470660345766445]
We propose a novel two-way approach that first biases the encoder with attention over a list of rare long-tail and out-of-vocabulary words. We evaluate our approach on open-source VoxPopuli and in-house medical datasets to showcase a 60% improvement in F1 score on domain-specific rare words.
arXiv Detail & Related papers (2022-10-18T01:08:21Z)
Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT [72.93855288283059]
We propose a non-autoregressive speech recognition model called LASO (Listen Attentively, and Spell Once) The model consists of an encoder, a decoder, and a position dependent summarizer (PDS)
arXiv Detail & Related papers (2021-02-15T15:18:59Z)
BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision [49.42215511723874]
We propose a new computational framework -- BOND -- to improve the prediction performance of NER models. Specifically, we propose a two-stage training algorithm: In the first stage, we adapt the pre-trained language model to the NER tasks using the distant labels. In the second stage, we drop the distant labels, and propose a self-training approach to further improve the model performance.
arXiv Detail & Related papers (2020-06-28T04:55:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.