Reward Optimization for Neural Machine Translation with Learned Metrics
- URL: http://arxiv.org/abs/2104.07541v1
- Date: Thu, 15 Apr 2021 15:53:31 GMT
- Title: Reward Optimization for Neural Machine Translation with Learned Metrics
- Authors: Raphael Shu, Kang Min Yoo, Jung-Woo Ha
- Abstract summary: We investigate whether it is beneficial to optimize Neural machine translation (NMT) models with the state-of-the-art model-based metric, BLEURT.
Results show that the reward optimization with BLEURT is able to increase the metric scores by a large margin, in contrast to limited gain when training with smoothed BLEU.
- Score: 18.633477083783248
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural machine translation (NMT) models are conventionally trained with
token-level negative log-likelihood (NLL), which does not guarantee that the
generated translations will be optimized for a selected sequence-level
evaluation metric. Multiple approaches are proposed to train NMT with BLEU as
the reward, in order to directly improve the metric. However, it was reported
that the gain in BLEU does not translate to real quality improvement, limiting
the application in industry. Recently, it became clear to the community that
BLEU has a low correlation with human judgment when dealing with
state-of-the-art models. This leads to the emerging of model-based evaluation
metrics. These new metrics are shown to have a much higher human correlation.
In this paper, we investigate whether it is beneficial to optimize NMT models
with the state-of-the-art model-based metric, BLEURT. We propose a
contrastive-margin loss for fast and stable reward optimization suitable for
large NMT models. In experiments, we perform automatic and human evaluations to
compare models trained with smoothed BLEU and BLEURT to the baseline models.
Results show that the reward optimization with BLEURT is able to increase the
metric scores by a large margin, in contrast to limited gain when training with
smoothed BLEU. The human evaluation shows that models trained with BLEURT
improve adequacy and coverage of translations. Code is available via
https://github.com/naver-ai/MetricMT.
Related papers
- Offline Model-Based Optimization by Learning to Rank [26.21886715050762]
We argue that regression models trained with mean squared error (MSE) are not well-aligned with the primary goal of offline model-based optimization.
We propose learning a ranking-based model that leverages learning to rank techniques to prioritize promising designs based on their relative scores.
arXiv Detail & Related papers (2024-10-15T11:15:03Z) - Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback [64.67540769692074]
Large language models (LLMs) fine-tuned with alignment techniques, such as reinforcement learning from human feedback, have been instrumental in developing some of the most capable AI systems to date.
We introduce an approach called Margin Matching Preference Optimization (MMPO), which incorporates relative quality margins into optimization, leading to improved LLM policies and reward models.
Experiments with both human and AI feedback data demonstrate that MMPO consistently outperforms baseline methods, often by a substantial margin, on popular benchmarks including MT-bench and RewardBench.
arXiv Detail & Related papers (2024-10-04T04:56:11Z) - Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [88.56809269990625]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions.
Our experimental results demonstrate that when fine-tuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, Self-Exploring Language Models (SELM) significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z) - Human Evaluation of English--Irish Transformer-Based NMT [2.648836772989769]
Best-performing Transformer system significantly reduces both accuracy and errors when compared with an RNN-based model.
When benchmarked against Google Translate, our translation engines demonstrated significant improvements.
arXiv Detail & Related papers (2024-03-04T11:45:46Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Better Datastore, Better Translation: Generating Datastores from
Pre-Trained Models for Nearest Neural Machine Translation [48.58899349349702]
Nearest Neighbor Machine Translation (kNNMT) is a simple and effective method of augmenting neural machine translation (NMT) with a token-level nearest neighbor retrieval mechanism.
In this paper, we propose PRED, a framework that leverages Pre-trained models for Datastores in kNN-MT.
arXiv Detail & Related papers (2022-12-17T08:34:20Z) - Towards Robust k-Nearest-Neighbor Machine Translation [72.9252395037097]
k-Nearest-Neighbor Machine Translation (kNN-MT) becomes an important research direction of NMT in recent years.
Its main idea is to retrieve useful key-value pairs from an additional datastore to modify translations without updating the NMT model.
The underlying retrieved noisy pairs will dramatically deteriorate the model performance.
We propose a confidence-enhanced kNN-MT model with robust training to alleviate the impact of noise.
arXiv Detail & Related papers (2022-10-17T07:43:39Z) - End-to-End Training for Back-Translation with Categorical Reparameterization Trick [0.0]
Back-translation is an effective semi-supervised learning framework in neural machine translation (NMT)
A pre-trained NMT model translates monolingual sentences and makes synthetic bilingual sentence pairs for the training of the other NMT model.
The discrete property of translated sentences prevents information gradient from flowing between the two NMT models.
arXiv Detail & Related papers (2022-02-17T06:31:03Z) - Energy-Based Reranking: Improving Neural Machine Translation Using
Energy-Based Models [59.039592890187144]
We study the discrepancy between maximum likelihood estimation (MLE) and task measures such as BLEU score for autoregressive neural machine translation (NMT)
Samples drawn from an MLE-based trained NMT support the desired distribution -- there are samples with much higher BLEU score compared to the beam decoding output.
We use both marginal energy models (over target sentence) and joint energy models (over both source and target sentences) to improve our algorithm.
arXiv Detail & Related papers (2020-09-20T02:50:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.