Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
- URL: http://arxiv.org/abs/2401.08417v4
- Date: Mon, 3 Jun 2024 01:28:06 GMT
- Title: Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
- Authors: Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, Young Jin Kim,
- Abstract summary: We train ALMA models with only 22K parallel sentences and 12M parameters.
The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4.
- Score: 50.00235162432848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Moderate-sized large language models (LLMs) -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We first assess the shortcomings of supervised fine-tuning for LLMs in the MT task, emphasizing the quality issues present in the reference data, despite being human-generated. Then, in contrast to SFT which mimics reference translations, we introduce Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations. Applying CPO to ALMA models with only 22K parallel sentences and 12M parameters yields significant improvements. The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4 on WMT'21, WMT'22 and WMT'23 test datasets.
Related papers
- Modeling User Preferences with Automatic Metrics: Creating a High-Quality Preference Dataset for Machine Translation [18.077562738603792]
We propose an approach that leverages the best of both worlds.
We first collect sentence-level quality assessments from professional linguists on translations generated by multiple high-quality MT systems.
We then use this analysis to curate a new dataset, MT-Pref, which comprises 18k instances covering 18 language directions.
arXiv Detail & Related papers (2024-10-10T10:09:54Z) - Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level [9.699022347910121]
General-purpose Large Language Models (LLMs) have achieved remarkable advancements in machine translation (MT) by leveraging extensive web content.
However, translation-specific LLMs are built by pre-training on domain-specific monolingual corpora and fine-tuning with human-annotated translation data.
We develop MT-Ladder, a novel model-agnostic and cost-effective tool to refine the performance of general LLMs for MT.
arXiv Detail & Related papers (2024-06-22T05:33:35Z) - Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages.
Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs.
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z) - AffineQuant: Affine Transformation Quantization for Large Language Models [58.45460102764]
Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its compression efficiency and cost-effectiveness in the context of training.
Existing PTQ methods for Large-scale Language Models (LLMs) limit the optimization scope to scaling transformations between pre- and post-quantization weights.
In this paper, we advocate for the direct optimization using equivalent Affine transformations in PTQ (AffineQuant)
arXiv Detail & Related papers (2024-03-19T08:40:21Z) - Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative Decoding [73.32763904267186]
Large Language Models (LLMs) present the potential for achieving superior translation quality.
We propose Cooperative Decoding (CoDec) which treats NMT systems as a pretranslation model and MT-oriented LLMs as a supplemental solution.
arXiv Detail & Related papers (2023-11-06T03:41:57Z) - SCALE: Synergized Collaboration of Asymmetric Language Translation
Engines [105.8983433641208]
We introduce a collaborative framework that connects compact Specialized Translation Models (STMs) and general-purpose Large Language Models (LLMs) as one unified translation engine.
By introducing translation from STM into the triplet in-context demonstrations, SCALE unlocks refinement and pivoting ability of LLM.
Our experiments show that SCALE significantly outperforms both few-shot LLMs (GPT-4) and specialized models (NLLB) in challenging low-resource settings.
arXiv Detail & Related papers (2023-09-29T08:46:38Z) - A Paradigm Shift in Machine Translation: Boosting Translation
Performance of Large Language Models [27.777372498182864]
We propose a novel fine-tuning approach for Generative Large Language Models (LLMs)
Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data.
Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance.
arXiv Detail & Related papers (2023-09-20T22:53:15Z) - Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC [51.34222224728979]
This paper introduces a series of innovative techniques to enhance the translation quality of Non-Autoregressive Translation (NAT) models.
We propose fine-tuning Pretrained Multilingual Language Models (PMLMs) with the CTC loss to train NAT models effectively.
Our model exhibits a remarkable speed improvement of 16.35 times compared to the autoregressive model.
arXiv Detail & Related papers (2023-06-10T05:24:29Z) - Pronoun-Targeted Fine-tuning for NMT with Hybrid Losses [6.596002578395152]
We introduce a class of conditional generative-discriminative hybrid losses that we use to fine-tune a trained machine translation model.
We improve the model performance of both a sentence-level and a contextual model without using any additional data.
Our sentence-level model shows a 0.5 BLEU improvement on both the WMT14 and the IWSLT13 De-En testsets.
Our contextual model achieves the best results, improving from 31.81 to 32 BLEU on WMT14 De-En testset, and from 32.10 to 33.13 on the IWSLT13 De-En
arXiv Detail & Related papers (2020-10-15T10:11:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.