Related papers: Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution

Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution

URL: http://arxiv.org/abs/2402.11525v3
Date: Tue, 27 Feb 2024 17:12:38 GMT
Title: Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution
Authors: Nuo Xu, Jun Zhao, Can Zu, Sixian Li, Lu Chen, Zhihao Zhang, Rui Zheng, Shihan Dou, Wenjuan Qin, Tao Gui, Qi Zhang, Xuanjing Huang
Abstract summary: We explore leveraging reinforcement learning with human feedback to improve translation quality. A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality.
Score: 57.42593422091653
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Faithfulness, expressiveness, and elegance is the constant pursuit in machine translation. However, traditional metrics like \textit{BLEU} do not strictly align with human preference of translation quality. In this paper, we explore leveraging reinforcement learning with human feedback (\textit{RLHF}) to improve translation quality. It is non-trivial to collect a large high-quality dataset of human comparisons between translations, especially for low-resource languages. To address this issue, we propose a cost-effective preference learning strategy, optimizing reward models by distinguishing between human and machine translations. In this manner, the reward model learns the deficiencies of machine translation compared to human and guides subsequent improvements in machine translation. Experimental results demonstrate that \textit{RLHF} can effectively enhance translation quality and this improvement benefits other translation directions not trained with \textit{RLHF}. Further analysis indicates that the model's language capabilities play a crucial role in preference learning. A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality and align better with real human translation preferences.

Related papers

Multi-perspective Alignment for Increasing Naturalness in Neural Machine Translation [11.875491080062233]
Neural machine translation (NMT) systems amplify lexical biases present in their training data, leading to artificially impoverished language in output translations. We introduce a novel method that rewards both naturalness and content preservation. We evaluate our method on English-to-Dutch literary translation, and find that our best model produces translations that are lexically richer and exhibit more properties of human-written language, without loss in translation accuracy.
arXiv Detail & Related papers (2024-12-11T15:42:22Z)
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings [25.851419860597407]
Reinforcement learning (RL) has been proven to be an effective and robust method for training neural machine translation systems. We introduce a novel approach that leverages fine-grained token-level reward mechanisms with RL methods. We conduct experiments on small and large translation datasets to compare the impact of sentence-level versus fine-grained reward signals on translation quality.
arXiv Detail & Related papers (2024-11-08T21:55:37Z)
Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model [75.66013048128302]
In this work, we investigate the potential of employing the QE model as the reward model to predict human preferences for feedback training. We first identify the overoptimization problem during QE-based feedback training, manifested as an increase in reward while translation quality declines. To address the problem, we adopt a simple yet effective method that uses rules to detect the incorrect translations and assigns a penalty term to the reward scores of them.
arXiv Detail & Related papers (2024-01-23T16:07:43Z)
Aligning Neural Machine Translation Models: Human Feedback in Training and Inference [27.84975767573212]
Reinforcement learning from human feedback (RLHF) is a technique to improve the quality of the text generated by a language model. In machine translation (MT), where metrics trained from human annotations can readily be used as reward models, methods using minimum Bayes risk decoding and reranking have succeeded in improving the final quality of translation.
arXiv Detail & Related papers (2023-11-15T17:21:58Z)
Iterative Translation Refinement with Large Language Models [25.90607157524168]
We propose iteratively prompting a large language model to self-correct a translation. We also discuss the challenges in evaluation and relation to human performance and translationese.
arXiv Detail & Related papers (2023-06-06T16:51:03Z)
The Best of Both Worlds: Combining Human and Machine Translations for Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations. An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z)
HanoiT: Enhancing Context-aware Translation via Selective Context [95.93730812799798]
Context-aware neural machine translation aims to use the document-level context to improve translation quality. The irrelevant or trivial words may bring some noise and distract the model from learning the relationship between the current sentence and the auxiliary context. We propose a novel end-to-end encoder-decoder model with a layer-wise selection mechanism to sift and refine the long document context.
arXiv Detail & Related papers (2023-01-17T12:07:13Z)
Non-Parametric Online Learning from Human Feedback for Neural Machine Translation [54.96594148572804]
We study the problem of online learning with human feedback in the human-in-the-loop machine translation. Previous methods require online model updating or additional translation memory networks to achieve high-quality performance. We propose a novel non-parametric online learning method without changing the model structure.
arXiv Detail & Related papers (2021-09-23T04:26:15Z)
Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization. We show how to practically optimize this objective for large translation corpora using an iterated best response scheme. Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z)
ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee. It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z)
On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice. By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data. We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z)
Detecting over/under-translation errors for determining adequacy in human translations [0.0]
We present a novel approach to detecting over and under translations (OT/UT) as part of adequacy error checks in translation evaluation. We do not restrict ourselves to machine translation (MT) outputs and specifically target applications with human generated translation pipeline. The goal of our system is to identify OT/UT errors from human translated video subtitles with high error recall.
arXiv Detail & Related papers (2021-04-01T06:06:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.