Related papers: Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding

Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding

URL: http://arxiv.org/abs/2311.08380v2
Date: Fri, 12 Apr 2024 14:07:38 GMT
Title: Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding
Authors: Guangyu Yang, Jinghong Chen, Weizhe Lin, Bill Byrne,
Abstract summary: We show how the recently developed Reinforcement Learning technique, Direct Preference Optimization (DPO), can fine-tune Multilingual Large Language Models without additional computation. Our method uses only a small monolingual fine-tuning set and yields significantly improved performance on multiple NMT test sets compared to MLLMs without DPO.
Score: 15.309135455863753
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Minimum Bayes Risk (MBR) decoding can significantly improve translation performance of Multilingual Large Language Models (MLLMs). However, MBR decoding is computationally expensive. We show how the recently developed Reinforcement Learning technique, Direct Preference Optimization (DPO), can fine-tune MLLMs to get the gains of MBR without any additional computation in inference. Our method uses only a small monolingual fine-tuning set and yields significantly improved performance on multiple NMT test sets compared to MLLMs without DPO.

Related papers

Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages [10.418542753869433]
Low-resource languages (LRLs) face significant challenges in natural language processing (NLP) due to limited data. Current state-of-the-art large language models (LLMs) still struggle with LRLs. Small multilingual models (mLMs) such as mBERT and XLM-R offer greater promise due to a better fit of their capacity to low training data sizes.
arXiv Detail & Related papers (2025-02-14T13:10:39Z)
Better Instruction-Following Through Minimum Bayes Risk [48.879360919760074]
General-purpose LLM judges capable of human-level evaluation provide a scalable and accurate way of evaluating instruction-following LLMs. One promising way of leveraging LLM judges for supervision is through Minimum Bayes Risk (MBR) decoding. MBR decoding uses a reference-based evaluator to select a high-quality output from amongst a set of candidate outputs.
arXiv Detail & Related papers (2024-10-03T18:48:38Z)
Multi-Reference Preference Optimization for Large Language Models [56.84730239046117]
We introduce a novel closed-form formulation for direct preference optimization using multiple reference models. The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models. Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance.
arXiv Detail & Related papers (2024-05-26T00:29:04Z)
Chasing COMET: Leveraging Minimum Bayes Risk Decoding for Self-Improving Machine Translation [0.0]
This paper explores Minimum Bayes Risk (MBR) decoding for self-improvement in machine translation (MT) We implement the self-improvement process by fine-tuning the model on its MBR-decoded forward translations. The results demonstrate significant enhancements in translation quality for all examined language pairs.
arXiv Detail & Related papers (2024-05-20T10:25:03Z)
Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages. Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs. In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z)
adaptMLLM: Fine-Tuning Multilingual Language Models on Low-Resource Languages with Integrated LLM Playgrounds [2.648836772989769]
adaptMLLM is an open-source tool for fine-tuning Multilingual Language Models (MLLMs) for Machine Translation (MT) It offers a range of metrics for model evaluation and the capability to deploy models as a translation service directly within the application. The adaptMLLM system demonstrated significant improvements compared with baselines from the LoResMT 2021 Shared Task.
arXiv Detail & Related papers (2024-03-04T14:49:18Z)
POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine Translation [32.76853731410492]
Low-resource languages (LRLs) face challenges in supervised neural machine translation due to limited parallel data. We propose Probability-driven Meta-graph Prompter (POMP) to enhance Large Language Models' translation capabilities for LRLs. Our experiments show significant improvements in the translation quality of three LRLs.
arXiv Detail & Related papers (2024-01-11T00:03:36Z)
It's MBR All the Way Down: Modern Generation Techniques Through the Lens of Minimum Bayes Risk [57.641436861482696]
Minimum Bayes Risk (MBR) decoding is a method for choosing the outputs of a machine learning system based not on the output with the highest probability, but the output with the lowest risk (expected error) among multiple candidates.
arXiv Detail & Related papers (2023-10-02T17:47:10Z)
Condensing Multilingual Knowledge with Lightweight Language-Specific Modules [52.973832863842546]
We introduce the Language-Specific Matrix Synthesis (LMS) method. This approach constructs LS modules by generating low-rank matrices from two significantly smaller matrices. We condense multilingual knowledge from multiple LS modules into a single shared module with the Fuse Distillation (FD) technique.
arXiv Detail & Related papers (2023-05-23T12:21:38Z)
On Language Model Integration for RNN Transducer based Speech Recognition [49.84285563767935]
We study various ILM correction-based LM integration methods formulated in a common RNN-T framework. We provide a decoding interpretation on two major reasons for performance improvement with ILM correction. We also propose an exact-ILM training framework by extending the proof given in the hybrid autoregressive transducer.
arXiv Detail & Related papers (2021-10-13T16:30:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.