Related papers: A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

URL: http://arxiv.org/abs/2406.10203v1
Date: Fri, 14 Jun 2024 17:38:21 GMT
Title: A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors
Authors: Naaman Tan, Josef Valvoda, Anej Svete, Tianyu Liu, Yanxia Qin, Kan Min-Yen, Ryan Cotterell,
Abstract summary: Given a general language model and its aligned version, there exists a trade-off between the average reward and average log-likelihood of the strings under the general language model. We provide a formal treatment of this issue and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.
Score: 50.046717886067555
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The relationship between the quality of a string and its probability $p(\boldsymbol{y})$ under a language model has been influential in the development of techniques to build good text generation systems. For example, several decoding algorithms have been motivated to manipulate $p(\boldsymbol{y})$ to produce higher-quality text. In this work, we examine the probability--quality relationship in language models explicitly aligned to human preferences, e.g., through Reinforcement Learning through Human Feedback (RLHF). We find that, given a general language model and its aligned version, for corpora sampled from an aligned language model, there exists a trade-off between the average reward and average log-likelihood of the strings under the general language model. We provide a formal treatment of this issue and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.

Related papers

Do Language Models Have Bayesian Brains? Distinguishing Stochastic and Deterministic Decision Patterns within Large Language Models [2.5812117322021644]
We show that under certain conditions, language models can exhibit near-deterministic decision-making.<n>This challenges the sampling assumption and undermines previous methods for eliciting human-like priors.<n>We propose a straightforward approach to distinguish between deterministic decision patterns in Gibbs sampling, helping to prevent the inference of misleading language model priors.
arXiv Detail & Related papers (2025-06-12T01:23:22Z)
Reverse-Engineering the Reader [43.26660964074272]
We introduce a novel alignment technique in which we fine-tune a language model to implicitly optimize the parameters of a linear regressor. Using words as a test case, we evaluate our technique across multiple model sizes and datasets. We find an inverse relationship between psychometric power and a model's performance on downstream NLP tasks as well as its perplexity on held-out test data.
arXiv Detail & Related papers (2024-10-16T23:05:01Z)
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment [39.94156255629528]
We evaluate a simple approach for zero-shot cross-lingual alignment. Cross-lingually aligned models are preferred by humans over unaligned models. A different-language reward model sometimes yields better aligned models than a same-language reward model.
arXiv Detail & Related papers (2024-04-18T16:52:36Z)
Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity. We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model. By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z)
Quark: Controllable Text Generation with Reinforced Unlearning [68.07749519374089]
Large-scale language models often learn behaviors that are misaligned with user expectations. We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property. For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods.
arXiv Detail & Related papers (2022-05-26T21:11:51Z)
Training Language Models with Natural Language Feedback [51.36137482891037]
We learn from language feedback on model outputs using a three-step learning algorithm. In synthetic experiments, we first evaluate whether language models accurately incorporate feedback to produce refinements. Using only 100 samples of human-written feedback, our learning algorithm finetunes a GPT-3 model to roughly human-level summarization.
arXiv Detail & Related papers (2022-04-29T15:06:58Z)
On the probability-quality paradox in language generation [76.69397802617064]
We analyze language generation through an information-theoretic lens. We posit that human-like language should contain an amount of information close to the entropy of the distribution over natural strings.
arXiv Detail & Related papers (2022-03-31T17:43:53Z)
Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive. We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z)
Towards Zero-Shot Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension [20.570539023748424]
We propose a simple method to generate multilingual question and answer pairs on a large scale. These synthetic samples can be used to improve the zero-shot performance of multilingual QA models on target languages.
arXiv Detail & Related papers (2020-10-22T19:59:37Z)
Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict. This work shows a comparison of a neural model and character language models with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.