Related papers: MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods

MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods

URL: http://arxiv.org/abs/2309.10966v6
Date: Mon, 25 Mar 2024 21:30:19 GMT
Title: MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods
Authors: Mara Finkelstein, Subhajit Naskar, Mehdi Mirzazadeh, Apurva Shah, Markus Freitag,
Abstract summary: We propose finetuning and QE finetuning to mitigate the model-perplexity-vs-quality mismatch. We show that even with self-training, these finetuning methods significantly outperform the base model. These findings suggest new ways to leverage monolingual data to achieve improvements in model quality that are on par with, or even exceed, improvements from human-curated data.
Score: 13.56549575939123
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent research in decoding methods for Natural Language Generation (NLG) tasks has shown that MAP decoding is not optimal, because model probabilities do not always align with human preferences. Stronger decoding methods, including Quality Estimation (QE) reranking and Minimum Bayes' Risk (MBR) decoding, have since been proposed to mitigate the model-perplexity-vs-quality mismatch. While these decoding methods achieve state-of-the-art performance, they are prohibitively expensive to compute. In this work, we propose MBR finetuning and QE finetuning which distill the quality gains from these decoding methods at training time, while using an efficient decoding algorithm at inference time. Using the canonical NLG task of Neural Machine Translation (NMT), we show that even with self-training, these finetuning methods significantly outperform the base model. Moreover, when using an external LLM as a teacher model, these finetuning methods outperform finetuning on human-generated references. These findings suggest new ways to leverage monolingual data to achieve improvements in model quality that are on par with, or even exceed, improvements from human-curated data, while maintaining maximum efficiency during decoding.

Related papers

Iterative Self-Training for Code Generation via Reinforced Re-Ranking [5.77678027975395]
We propose a novel iterative self-training approach for self-training reranker models using Proximal Policy Optimization (PPO) Unlike traditional PPO approaches, our approach emphasizes the development of a robust reward/reranking model. Our method refines the training dataset by re-evaluating outputs, identifying high-scoring negative examples, and incorporating them into the training loop.
arXiv Detail & Related papers (2025-04-13T16:34:17Z)
Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks. However, improvement is plateauing due to the exhaustion of readily available high-quality data. We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z)
Reward-Guided Speculative Decoding for Efficient LLM Reasoning [80.55186052123196]
We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs) RSD incorporates a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness. RSD delivers significant efficiency gains against decoding with the target model only, while achieving significant better accuracy than parallel decoding method on average.
arXiv Detail & Related papers (2025-01-31T17:19:57Z)
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs [56.24431208419858]
We introduce underlinetextbfDirect Preference Learning with Only underlinetextbfSelf-Generated underlinetextbfTests and underlinetextbfCode (DSTC) DSTC uses only self-generated code snippets and tests to construct reliable preference pairs.
arXiv Detail & Related papers (2024-11-20T02:03:16Z)
Adversarial Contrastive Decoding: Boosting Safety Alignment of Large Language Models via Opposite Prompt Optimization [34.29833630422768]
Adversarial Contrastive Decoding (ACD) is an optimization-based framework to generate two opposite system prompts for prompt-based contrastive decoding. ACD achieves much better safety performance than previous model training-free decoding methods without sacrificing original generation ability.
arXiv Detail & Related papers (2024-06-24T15:51:30Z)
Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model [77.19693792957614]
We propose to make neural machine translation (NMT) models quality-aware by training them to estimate the quality of their own output. We obtain quality gains similar or even superior to quality reranking approaches, but with the efficiency of single pass decoding.
arXiv Detail & Related papers (2023-10-10T15:33:51Z)
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning. During inference, we introduce a new generation procedure with a critical sampling strategy. For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z)
Quality-Aware Decoding for Neural Machine Translation [64.24934199944875]
We propose quality-aware decoding for neural machine translation (NMT) We leverage recent breakthroughs in reference-free and reference-based MT evaluation through various inference methods. We find that quality-aware decoding consistently outperforms MAP-based decoding according both to state-of-the-art automatic metrics and to human assessments.
arXiv Detail & Related papers (2022-05-02T15:26:28Z)
Integrate Lattice-Free MMI into End-to-End Speech Recognition [87.01137882072322]
In automatic speech recognition (ASR) research, discriminative criteria have achieved superior performance in DNN-HMM systems. With this motivation, the adoption of discriminative criteria is promising to boost the performance of end-to-end (E2E) ASR systems. Previous works have introduced the minimum Bayesian risk (MBR, one of the discriminative criteria) into E2E ASR systems. In this work, novel algorithms are proposed in this work to integrate another widely used discriminative criterion, lattice-free maximum mutual information (LF-MMI) into E2E
arXiv Detail & Related papers (2022-03-29T14:32:46Z)
Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks. Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients. We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z)
A new Sparse Auto-encoder based Framework using Grey Wolf Optimizer for Data Classification Problem [0.0]
Gray wolf optimization (GWO) is applied to train sparse auto-encoders. Model is validated by employing several popular Gene expression databases. Results reveal that the performance of the trained model using GWO outperforms on both conventional models and models trained with most popular metaheuristic algorithms.
arXiv Detail & Related papers (2022-01-29T04:28:30Z)
Efficient Decoding of Surface Code Syndromes for Error Correction in Quantum Computing [0.09236074230806578]
We propose a two-level (low and high) ML-based decoding scheme, where the first level corrects errors on physical qubits and the second one corrects any existing logical errors. Our results show that our proposed decoding method achieves $sim10 times$ and $sim2 times$ higher values of pseudo-threshold and threshold respectively. We show that usage of more sophisticated ML models with higher training/testing time, do not provide significant improvement in the decoder performance.
arXiv Detail & Related papers (2021-10-21T04:54:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.