Minimum Bayes Risk Decoding for Error Span Detection in Reference-Free Automatic Machine Translation Evaluation
- URL: http://arxiv.org/abs/2512.07540v1
- Date: Mon, 08 Dec 2025 13:21:44 GMT
- Title: Minimum Bayes Risk Decoding for Error Span Detection in Reference-Free Automatic Machine Translation Evaluation
- Authors: Boxuan Lyu, Haiyue Song, Hidetaka Kamigaito, Chenchen Ding, Hideki Tanaka, Masao Utiyama, Kotaro Funakoshi, Manabu Okumura,
- Abstract summary: State-of-the-art generative ESD methods typically decode using Maximum a Posteriori (MAP)<n>We address this issue by applying Minimum Bayes Risk (MBR) decoding to generative ESD models.
- Score: 50.83502171176548
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Error Span Detection (ESD) is a subtask of automatic machine translation evaluation that localizes error spans in translations and labels their severity. State-of-the-art generative ESD methods typically decode using Maximum a Posteriori (MAP), assuming that model-estimated probabilities are perfectly correlated with similarity to human annotation. However, we observed that annotations dissimilar to the human annotation could achieve a higher model likelihood than the human annotation. We address this issue by applying Minimum Bayes Risk (MBR) decoding to generative ESD models. Specifically, we employ sentence- and span-level similarity metrics as utility functions to select candidate hypotheses based on their approximate similarity to the human annotation. Extensive experimental results show that our MBR decoding outperforms the MAP baseline at the system, sentence, and span-levels. Furthermore, to mitigate the computational cost of MBR decoding, we demonstrate that applying MBR distillation enables a standard greedy model to match MBR decoding performance, effectively eliminating the inference-time latency bottleneck.
Related papers
- Agreement-Constrained Probabilistic Minimum Bayes Risk Decoding [51.82883249233765]
We propose agreement-constrained PMBR decoding, which leverages a knowledge distilled model to guide the completion of the score matrix.<n>Our AC-PMBR decoding improved approximation errors of matrix completion by up to 3 times and achieved higher translation quality compared with PMBR decoding.
arXiv Detail & Related papers (2025-12-01T06:16:47Z) - Every Step Counts: Decoding Trajectories as Authorship Fingerprints of dLLMs [63.82840470917859]
We show that the decoding mechanism of dLLMs can be used as a powerful tool for model attribution.<n>We propose a novel information extraction scheme called the Directed Decoding Map (DDM), which captures structural relationships between decoding steps and better reveals model-specific behaviors.
arXiv Detail & Related papers (2025-10-02T06:25:10Z) - Randomized Smoothing Meets Vision-Language Models [6.224335082856828]
Randomized smoothing (RS) is used to ensure correctness of machine learning models.<n>We show that RS can still be enabled for generative models.<n>We derive improved scaling laws analytically relating the certified radius and accuracy to the number of samples.<n>These advances make robustness certification both well-defined and computationally feasible for state-of-the-art VLMs.
arXiv Detail & Related papers (2025-09-19T15:33:22Z) - Linear-time Minimum Bayes Risk Decoding with Reference Aggregation [52.1701152610258]
Minimum Bayes Risk (MBR) decoding is a text generation technique that has been shown to improve the quality of machine translations.
It requires the pairwise calculation of a utility metric, which has quadratic complexity.
We propose to approximate pairwise metric scores with scores calculated against aggregated reference representations.
arXiv Detail & Related papers (2024-02-06T18:59:30Z) - Hyperparameter-Free Approach for Faster Minimum Bayes Risk Decoding [5.639904484784127]
Minimum Bayes-Risk (MBR) decoding is a powerful alternative to beam search decoding for a wide range of text generation tasks.
MBR requires a huge amount of time for inference to compute the objective.
Confidence-based pruning (CBP) has recently been proposed to reduce the inference time in machine translation tasks.
arXiv Detail & Related papers (2024-01-05T11:02:08Z) - Faster Minimum Bayes Risk Decoding with Confidence-based Pruning [8.709382540743391]
We describe an algorithm for Minimum Bayes risk (MBR) decoding which gradually grows the number of samples used to estimate the utility.
Our method requires fewer samples and drastically reduces the number of calls to the utility function compared to standard MBR.
We demonstrate the effectiveness of our approach in experiments on three language pairs, using chrF++ and COMET as utility/evaluation metrics.
arXiv Detail & Related papers (2023-11-25T03:38:14Z) - Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model [77.19693792957614]
We propose to make neural machine translation (NMT) models quality-aware by training them to estimate the quality of their own output.
We obtain quality gains similar or even superior to quality reranking approaches, but with the efficiency of single pass decoding.
arXiv Detail & Related papers (2023-10-10T15:33:51Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - Understanding the Properties of Minimum Bayes Risk Decoding in Neural
Machine Translation [26.33252528975464]
Neural Machine Translation (NMT) currently exhibits biases such as producing translations that are too short and overgenerating frequent words.
Recent work has tied these shortcomings to beam search.
Eikema & Aziz ( 2020) propose to use Minimum Bayes Risk (MBR) decoding on unbiased samples instead.
arXiv Detail & Related papers (2021-05-18T13:31:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.