Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model
- URL: http://arxiv.org/abs/2310.06707v4
- Date: Thu, 11 Jul 2024 12:25:06 GMT
- Title: Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model
- Authors: Christian Tomani, David Vilar, Markus Freitag, Colin Cherry, Subhajit Naskar, Mara Finkelstein, Xavier Garcia, Daniel Cremers,
- Abstract summary: We propose to make neural machine translation (NMT) models quality-aware by training them to estimate the quality of their own output.
We obtain quality gains similar or even superior to quality reranking approaches, but with the efficiency of single pass decoding.
- Score: 77.19693792957614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Maximum-a-posteriori (MAP) decoding is the most widely used decoding strategy for neural machine translation (NMT) models. The underlying assumption is that model probability correlates well with human judgment, with better translations getting assigned a higher score by the model. However, research has shown that this assumption does not always hold, and generation quality can be improved by decoding to optimize a utility function backed by a metric or quality-estimation signal, as is done by Minimum Bayes Risk (MBR) or quality-aware decoding. The main disadvantage of these approaches is that they require an additional model to calculate the utility function during decoding, significantly increasing the computational cost. In this paper, we propose to make the NMT models themselves quality-aware by training them to estimate the quality of their own output. Using this approach for MBR decoding we can drastically reduce the size of the candidate list, resulting in a speed-up of two-orders of magnitude. When applying our method to MAP decoding we obtain quality gains similar or even superior to quality reranking approaches, but with the efficiency of single pass decoding.
Related papers
- DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs [56.24431208419858]
We introduce underlinetextbfDirect Preference Learning with Only underlinetextbfSelf-Generated underlinetextbfTests and underlinetextbfCode (DSTC)
DSTC uses only self-generated code snippets and tests to construct reliable preference pairs.
arXiv Detail & Related papers (2024-11-20T02:03:16Z) - Unveiling the Power of Source: Source-based Minimum Bayes Risk Decoding for Neural Machine Translation [30.323103270892734]
Maximum a posteriori decoding, a commonly used method for neural machine translation (NMT), aims to maximize the estimated posterior probability.
Minimum Bayes Risk (MBR) decoding offers an alternative by seeking hypotheses with the highest expected utility.
arXiv Detail & Related papers (2024-06-17T15:13:52Z) - MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods [13.56549575939123]
We propose finetuning and QE finetuning to mitigate the model-perplexity-vs-quality mismatch.
We show that even with self-training, these finetuning methods significantly outperform the base model.
These findings suggest new ways to leverage monolingual data to achieve improvements in model quality that are on par with, or even exceed, improvements from human-curated data.
arXiv Detail & Related papers (2023-09-19T23:39:07Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - Quality-Aware Decoding for Neural Machine Translation [64.24934199944875]
We propose quality-aware decoding for neural machine translation (NMT)
We leverage recent breakthroughs in reference-free and reference-based MT evaluation through various inference methods.
We find that quality-aware decoding consistently outperforms MAP-based decoding according both to state-of-the-art automatic metrics and to human assessments.
arXiv Detail & Related papers (2022-05-02T15:26:28Z) - Efficient Decoding of Surface Code Syndromes for Error Correction in
Quantum Computing [0.09236074230806578]
We propose a two-level (low and high) ML-based decoding scheme, where the first level corrects errors on physical qubits and the second one corrects any existing logical errors.
Our results show that our proposed decoding method achieves $sim10 times$ and $sim2 times$ higher values of pseudo-threshold and threshold respectively.
We show that usage of more sophisticated ML models with higher training/testing time, do not provide significant improvement in the decoder performance.
arXiv Detail & Related papers (2021-10-21T04:54:44Z) - Model Selection for Bayesian Autoencoders [25.619565817793422]
We propose to optimize the distributional sliced-Wasserstein distance between the output of the autoencoder and the empirical data distribution.
We turn our BAE into a generative model by fitting a flexible Dirichlet mixture model in the latent space.
We evaluate our approach qualitatively and quantitatively using a vast experimental campaign on a number of unsupervised learning tasks and show that, in small-data regimes where priors matter, our approach provides state-of-the-art results.
arXiv Detail & Related papers (2021-06-11T08:55:00Z) - Offline Model-Based Optimization via Normalized Maximum Likelihood
Estimation [101.22379613810881]
We consider data-driven optimization problems where one must maximize a function given only queries at a fixed set of points.
This problem setting emerges in many domains where function evaluation is a complex and expensive process.
We propose a tractable approximation that allows us to scale our method to high-capacity neural network models.
arXiv Detail & Related papers (2021-02-16T06:04:27Z) - Discovering Representations for Black-box Optimization [73.59962178534361]
We show that black-box optimization encodings can be automatically learned, rather than hand designed.
We show that learned representations make it possible to solve high-dimensional problems with orders of magnitude fewer evaluations than the standard MAP-Elites.
arXiv Detail & Related papers (2020-03-09T20:06:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.