Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
- URL: http://arxiv.org/abs/2411.05986v1
- Date: Fri, 08 Nov 2024 21:55:37 GMT
- Title: Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
- Authors: Miguel Moura Ramos, Tomás Almeida, Daniel Vareta, Filipe Azevedo, Sweta Agrawal, Patrick Fernandes, André F. T. Martins,
- Abstract summary: Reinforcement learning (RL) has been proven to be an effective and robust method for training neural machine translation systems.
We introduce a novel approach that leverages fine-grained token-level reward mechanisms with RL methods.
We conduct experiments on small and large translation datasets to compare the impact of sentence-level versus fine-grained reward signals on translation quality.
- Score: 25.851419860597407
- License:
- Abstract: Reinforcement learning (RL) has been proven to be an effective and robust method for training neural machine translation systems, especially when paired with powerful reward models that accurately assess translation quality. However, most research has focused on RL methods that use sentence-level feedback, which leads to inefficient learning signals due to the reward sparsity problem -- the model receives a single score for the entire sentence. To address this, we introduce a novel approach that leverages fine-grained token-level reward mechanisms with RL methods. We use xCOMET, a state-of-the-art quality estimation system as our token-level reward model. xCOMET provides detailed feedback by predicting fine-grained error spans and their severity given source-translation pairs. We conduct experiments on small and large translation datasets to compare the impact of sentence-level versus fine-grained reward signals on translation quality. Our results show that training with token-level rewards improves translation quality across language pairs over baselines according to automatic and human evaluation. Furthermore, token-level reward optimization also improves training stability, evidenced by a steady increase in mean rewards over training epochs.
Related papers
- R3HF: Reward Redistribution for Enhancing Reinforcement Learning from Human Feedback [25.27230140274847]
Reinforcement learning from human feedback (RLHF) provides a paradigm for aligning large language models (LLMs) with human preferences.
This paper proposes a novel reward redistribution method called R3HF, which facilitates a more fine-grained, token-level reward allocation.
arXiv Detail & Related papers (2024-11-13T02:45:21Z) - Advancing Translation Preference Modeling with RLHF: A Step Towards
Cost-Effective Solution [57.42593422091653]
We explore leveraging reinforcement learning with human feedback to improve translation quality.
A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality.
arXiv Detail & Related papers (2024-02-18T09:51:49Z) - Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble [67.4269821365504]
Reinforcement Learning from Human Feedback (RLHF) is a widely adopted approach for aligning large language models with human values.
However, RLHF relies on a reward model that is trained with a limited amount of human preference data.
We contribute a reward ensemble method that allows the reward model to make more accurate predictions.
arXiv Detail & Related papers (2024-01-30T00:17:37Z) - Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model [75.66013048128302]
In this work, we investigate the potential of employing the QE model as the reward model to predict human preferences for feedback training.
We first identify the overoptimization problem during QE-based feedback training, manifested as an increase in reward while translation quality declines.
To address the problem, we adopt a simple yet effective method that uses rules to detect the incorrect translations and assigns a penalty term to the reward scores of them.
arXiv Detail & Related papers (2024-01-23T16:07:43Z) - Aligning Neural Machine Translation Models: Human Feedback in Training and Inference [27.84975767573212]
Reinforcement learning from human feedback (RLHF) is a technique to improve the quality of the text generated by a language model.
In machine translation (MT), where metrics trained from human annotations can readily be used as reward models, methods using minimum Bayes risk decoding and reranking have succeeded in improving the final quality of translation.
arXiv Detail & Related papers (2023-11-15T17:21:58Z) - Semi-supervised Neural Machine Translation with Consistency
Regularization for Low-Resource Languages [3.475371300689165]
This paper presents a simple yet effective method to tackle the problem for low-resource languages by augmenting high-quality sentence pairs and training NMT models in a semi-supervised manner.
Specifically, our approach combines the cross-entropy loss for supervised learning with KL Divergence for unsupervised fashion given pseudo and augmented target sentences.
Experimental results show that our approach significantly improves NMT baselines, especially on low-resource datasets with 0.46--2.03 BLEU scores.
arXiv Detail & Related papers (2023-04-02T15:24:08Z) - Non-Parametric Online Learning from Human Feedback for Neural Machine
Translation [54.96594148572804]
We study the problem of online learning with human feedback in the human-in-the-loop machine translation.
Previous methods require online model updating or additional translation memory networks to achieve high-quality performance.
We propose a novel non-parametric online learning method without changing the model structure.
arXiv Detail & Related papers (2021-09-23T04:26:15Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.