Advancing Translation Preference Modeling with RLHF: A Step Towards
Cost-Effective Solution
- URL: http://arxiv.org/abs/2402.11525v3
- Date: Tue, 27 Feb 2024 17:12:38 GMT
- Title: Advancing Translation Preference Modeling with RLHF: A Step Towards
Cost-Effective Solution
- Authors: Nuo Xu, Jun Zhao, Can Zu, Sixian Li, Lu Chen, Zhihao Zhang, Rui Zheng,
Shihan Dou, Wenjuan Qin, Tao Gui, Qi Zhang, Xuanjing Huang
- Abstract summary: We explore leveraging reinforcement learning with human feedback to improve translation quality.
A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality.
- Score: 57.42593422091653
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Faithfulness, expressiveness, and elegance is the constant pursuit in machine
translation. However, traditional metrics like \textit{BLEU} do not strictly
align with human preference of translation quality. In this paper, we explore
leveraging reinforcement learning with human feedback (\textit{RLHF}) to
improve translation quality. It is non-trivial to collect a large high-quality
dataset of human comparisons between translations, especially for low-resource
languages. To address this issue, we propose a cost-effective preference
learning strategy, optimizing reward models by distinguishing between human and
machine translations. In this manner, the reward model learns the deficiencies
of machine translation compared to human and guides subsequent improvements in
machine translation. Experimental results demonstrate that \textit{RLHF} can
effectively enhance translation quality and this improvement benefits other
translation directions not trained with \textit{RLHF}. Further analysis
indicates that the model's language capabilities play a crucial role in
preference learning. A reward model with strong language capabilities can more
sensitively learn the subtle differences in translation quality and align
better with real human translation preferences.
Related papers
- Aligning Neural Machine Translation Models: Human Feedback in Training and Inference [27.84975767573212]
Reinforcement learning from human feedback (RLHF) is a technique to improve the quality of the text generated by a language model.
In machine translation (MT), where metrics trained from human annotations can readily be used as reward models, methods using minimum Bayes risk decoding and reranking have succeeded in improving the final quality of translation.
arXiv Detail & Related papers (2023-11-15T17:21:58Z) - Iterative Translation Refinement with Large Language Models [25.90607157524168]
We propose iteratively prompting a large language model to self-correct a translation.
We also discuss the challenges in evaluation and relation to human performance and translationese.
arXiv Detail & Related papers (2023-06-06T16:51:03Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - HanoiT: Enhancing Context-aware Translation via Selective Context [95.93730812799798]
Context-aware neural machine translation aims to use the document-level context to improve translation quality.
The irrelevant or trivial words may bring some noise and distract the model from learning the relationship between the current sentence and the auxiliary context.
We propose a novel end-to-end encoder-decoder model with a layer-wise selection mechanism to sift and refine the long document context.
arXiv Detail & Related papers (2023-01-17T12:07:13Z) - Non-Parametric Online Learning from Human Feedback for Neural Machine
Translation [54.96594148572804]
We study the problem of online learning with human feedback in the human-in-the-loop machine translation.
Previous methods require online model updating or additional translation memory networks to achieve high-quality performance.
We propose a novel non-parametric online learning method without changing the model structure.
arXiv Detail & Related papers (2021-09-23T04:26:15Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z) - On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.
By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data.
We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z) - Detecting over/under-translation errors for determining adequacy in
human translations [0.0]
We present a novel approach to detecting over and under translations (OT/UT) as part of adequacy error checks in translation evaluation.
We do not restrict ourselves to machine translation (MT) outputs and specifically target applications with human generated translation pipeline.
The goal of our system is to identify OT/UT errors from human translated video subtitles with high error recall.
arXiv Detail & Related papers (2021-04-01T06:06:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.