Related papers: Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization

Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization

URL: http://arxiv.org/abs/2409.17673v1
Date: Thu, 26 Sep 2024 09:32:12 GMT
Title: Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization
Authors: Kaden Uhlig, Joern Wuebker, Raphael Reinauer, John DeNero,
Abstract summary: We show that applying task-alignment to neural machine translation (NMT) addresses an existing task--data mismatch in NMT. We introduce Direct Quality Optimization (DQO), a variant of DPO leveraging a pre-trained translation quality estimation model as a proxy for human preferences.
Score: 4.993565079216378
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Reinforcement Learning from Human Feedback (RLHF) and derivative techniques like Direct Preference Optimization (DPO) are task-alignment algorithms used to repurpose general, foundational models for specific tasks. We show that applying task-alignment to neural machine translation (NMT) addresses an existing task--data mismatch in NMT, leading to improvements across all languages of a multilingual model, even when task-alignment is only applied to a subset of those languages. We do so by introducing Direct Quality Optimization (DQO), a variant of DPO leveraging a pre-trained translation quality estimation model as a proxy for human preferences, and verify the improvements with both automatic metrics and human evaluation.

Related papers

Aligning Generative Speech Enhancement with Human Preferences via Direct Preference Optimization [46.94426003410216]
This work investigates speech enhancement from the perspective of language models (LMs)<n>Using UTMOS, a neural MOS prediction model, as a proxy for human ratings, our approach guides optimization toward perceptually preferred outputs.<n>Experiments on the 2020 Deep Noise Suppression Challenge test sets demonstrate that applying DPO to a pretrained LM-based SE model yields consistent improvements.
arXiv Detail & Related papers (2025-07-14T05:15:39Z)
Adaptive Batch-Wise Sample Scheduling for Direct Preference Optimization [37.54165341391688]
We introduce a novel problem: Sample Scheduling for DPO.<n>We propose SamS, an efficient and effective algorithm that adaptively selects samples in each training batch.<n>This work points to a promising new direction for improving LLM alignment through batch-wise sample selection.
arXiv Detail & Related papers (2025-06-08T10:26:09Z)
Improving Retrieval-Augmented Neural Machine Translation with Monolingual Data [9.67203800171351]
In many settings, in-domain monolingual target-side corpora are often available. This work explores ways to take advantage of such resources by retrieving relevant segments directly in the target language. In experiments with two RANMT architectures, we first demonstrate the benefits of such cross-lingual objectives in a controlled setting. We then showcase our method on a real-world set-up, where the target monolingual resources far exceed the amount of parallel data.
arXiv Detail & Related papers (2025-04-30T15:41:03Z)
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives. We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z)
GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets [19.485572131953937]
We propose a practical application of a diversity-seeking RL algorithm called GFlowNet-DPO (GDPO) in an offline preference alignment setting. Empirical results show GDPO can generate far more diverse responses than the baseline methods.
arXiv Detail & Related papers (2024-10-19T13:07:52Z)
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis [20.023077870947024]
This study focuses on Contrastive Preference Optimization (CPO) and conducts experiments to evaluate the impact of preference-based alignment on translation quality. Our findings indicate that while CPO consistently outperforms Supervised Fine-Tuning (SFT) on high-quality data with regard to the alignment metric, it may lead to instability across downstream evaluation metrics.
arXiv Detail & Related papers (2024-09-30T08:01:44Z)
Preference Alignment Improves Language Model-Based TTS [76.70693823683091]
preference alignment algorithms adjust LMs to align with the preferences of reward models, enhancing the desirability of the generated content. With a 1.15B parameter LM-based TTS model, we demonstrate that preference alignment consistently improves intelligibility, speaker similarity, and proxy subjective evaluation scores.
arXiv Detail & Related papers (2024-09-19T01:58:19Z)
Calibrating LLM-Based Evaluator [92.17397504834825]
We propose AutoCalibrate, a multi-stage, gradient-free approach to calibrate and align an LLM-based evaluator toward human preference. Instead of explicitly modeling human preferences, we first implicitly encompass them within a set of human labels. Our experiments on multiple text quality evaluation datasets illustrate a significant improvement in correlation with expert evaluation through calibration.
arXiv Detail & Related papers (2023-09-23T08:46:11Z)
Non-Parametric Online Learning from Human Feedback for Neural Machine Translation [54.96594148572804]
We study the problem of online learning with human feedback in the human-in-the-loop machine translation. Previous methods require online model updating or additional translation memory networks to achieve high-quality performance. We propose a novel non-parametric online learning method without changing the model structure.
arXiv Detail & Related papers (2021-09-23T04:26:15Z)
Improving Multilingual Translation by Representation and Gradient Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level. Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z)
Verdi: Quality Estimation and Error Detection for Bilingual [23.485380293716272]
Verdi is a novel framework for word-level and sentence-level post-editing effort estimation for bilingual corpora. We exploit the symmetric nature of bilingual corpora and apply model-level dual learning in the NMT predictor. Our method beats the winner of the competition and outperforms other baseline methods by a great margin.
arXiv Detail & Related papers (2021-05-31T11:04:13Z)
A Simple Baseline to Semi-Supervised Domain Adaptation for Machine Translation [73.3550140511458]
State-of-the-art neural machine translation (NMT) systems are data-hungry and perform poorly on new domains with no supervised data. We propose a simple but effect approach to the semi-supervised domain adaptation scenario of NMT. This approach iteratively trains a Transformer-based NMT model via three training objectives: language modeling, back-translation, and supervised translation.
arXiv Detail & Related papers (2020-01-22T16:42:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.