Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition
- URL: http://arxiv.org/abs/2510.19471v1
- Date: Wed, 22 Oct 2025 11:06:20 GMT
- Title: Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition
- Authors: Yuu Jinnai,
- Abstract summary: Minimum Bayes Risk (MBR) decoding is effective in text-to-text generation tasks.<n> beam search is the current practice for speech-to-text tasks such as automatic speech recognition (ASR) and Speech Translation (ST)
- Score: 16.295305195753723
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has shown that sample-based Minimum Bayes Risk (MBR) decoding outperforms beam search in text-to-text generation tasks, such as machine translation, text summarization, and image captioning. On the other hand, beam search is the current practice for speech-to-text tasks such as automatic speech recognition (ASR) and Speech Translation (ST). Given that MBR decoding is effective in text-to-text generation tasks, it is reasonable to expect it to also be effective for speech-to-text tasks. In this paper, we evaluate MBR decoding for ASR and ST tasks on English and Japanese using Whisper and its derivative models. We observe that the accuracy of MBR decoding outperforms that of beam search in most of the experimental settings we have evaluated. The results show that MBR decoding is a promising method for offline ASR and ST tasks that require high accuracy. The code is available at https://github.com/CyberAgentAILab/mbr-for-asr
Related papers
- Minimum Bayes Risk Decoding for Error Span Detection in Reference-Free Automatic Machine Translation Evaluation [50.83502171176548]
State-of-the-art generative ESD methods typically decode using Maximum a Posteriori (MAP)<n>We address this issue by applying Minimum Bayes Risk (MBR) decoding to generative ESD models.
arXiv Detail & Related papers (2025-12-08T13:21:44Z) - Case-Based Decision-Theoretic Decoding with Quality Memories [9.995028045771862]
Minimum Bayes risk (MBR) decoding is a decision rule of text generation.<n>It depends on sample texts drawn from the text generation model.<n>It is difficult to find a hypothesis that correctly captures the knowledge or information of out-of-domain.
arXiv Detail & Related papers (2025-09-16T05:01:05Z) - Document-Level Text Generation with Minimum Bayes Risk Decoding using Optimal Transport [3.48097307252416]
We investigate the adaption of Minimum Bayes Risk (MBR) decoding for document-level text generation tasks.<n>MBR decoding makes use of a utility function to estimate the output with the highest expected utility from a set of candidate outputs.<n>MBR-OT, a variant of MBR decoding using Wasserstein distance, computes the utility of a document using a sentence-level utility function.
arXiv Detail & Related papers (2025-05-29T04:34:04Z) - mbrs: A Library for Minimum Bayes Risk Decoding [27.207891251898904]
mbrs is a library of Minimum Bayes risk (MBR) decoding.
MBR is a decision rule of text generation tasks that outperforms conventional maximum a posterior (MAP) decoding.
We published our mbrs as an MIT-licensed open-source project, and the code is available on GitHub.
arXiv Detail & Related papers (2024-08-08T02:28:32Z) - Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
Named entity recognition (NER) models often struggle with noisy inputs.
We propose a more realistic setting in which only noisy text and its NER labels are available.
We employ a multi-view training framework that improves robust NER without retrieving text during inference.
arXiv Detail & Related papers (2024-07-26T07:30:41Z) - Linear-time Minimum Bayes Risk Decoding with Reference Aggregation [52.1701152610258]
Minimum Bayes Risk (MBR) decoding is a text generation technique that has been shown to improve the quality of machine translations.
It requires the pairwise calculation of a utility metric, which has quadratic complexity.
We propose to approximate pairwise metric scores with scores calculated against aggregated reference representations.
arXiv Detail & Related papers (2024-02-06T18:59:30Z) - Hyperparameter-Free Approach for Faster Minimum Bayes Risk Decoding [5.639904484784127]
Minimum Bayes-Risk (MBR) decoding is a powerful alternative to beam search decoding for a wide range of text generation tasks.
MBR requires a huge amount of time for inference to compute the objective.
Confidence-based pruning (CBP) has recently been proposed to reduce the inference time in machine translation tasks.
arXiv Detail & Related papers (2024-01-05T11:02:08Z) - Context Perception Parallel Decoder for Scene Text Recognition [52.620841341333524]
Scene text recognition methods have struggled to attain high accuracy and fast inference speed.
We present an empirical study of AR decoding in STR, and discover that the AR decoder not only models linguistic context, but also provides guidance on visual context perception.
We construct a series of CPPD models and also plug the proposed modules into existing STR decoders. Experiments on both English and Chinese benchmarks demonstrate that the CPPD models achieve highly competitive accuracy while running approximately 8x faster than their AR-based counterparts.
arXiv Detail & Related papers (2023-07-23T09:04:13Z) - SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder
Based Speech-Text Pre-training [106.34112664893622]
We propose a unified-modal speech-unit-text pre-training model, SpeechUT, to connect the representations of a speech encoder and a text decoder with a shared unit encoder.
Our proposed SpeechUT is fine-tuned and evaluated on automatic speech recognition (ASR) and speech translation (ST) tasks.
arXiv Detail & Related papers (2022-10-07T17:57:45Z) - MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and
Understanding [70.16678926775475]
MMOCR is an open-source toolbox for text detection and recognition.
It implements 14 state-of-the-art algorithms, which is more than all the existing open-source OCR projects we are aware of to date.
arXiv Detail & Related papers (2021-08-14T14:10:23Z) - Understanding the Properties of Minimum Bayes Risk Decoding in Neural
Machine Translation [26.33252528975464]
Neural Machine Translation (NMT) currently exhibits biases such as producing translations that are too short and overgenerating frequent words.
Recent work has tied these shortcomings to beam search.
Eikema & Aziz ( 2020) propose to use Minimum Bayes Risk (MBR) decoding on unbiased samples instead.
arXiv Detail & Related papers (2021-05-18T13:31:05Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.