Preference-grounded Token-level Guidance for Language Model Fine-tuning
- URL: http://arxiv.org/abs/2306.00398v2
- Date: Mon, 9 Oct 2023 23:31:44 GMT
- Title: Preference-grounded Token-level Guidance for Language Model Fine-tuning
- Authors: Shentao Yang, Shujian Zhang, Congying Xia, Yihao Feng, Caiming Xiong,
Mingyuan Zhou
- Abstract summary: Aligning language models with preferences is an important problem in natural language generation.
For LM training, based on the amount of supervised data, we present two *minimalist* learning objectives that utilize the learned guidance.
In experiments, our method performs competitively on two distinct representative LM tasks.
- Score: 105.88789610320426
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Aligning language models (LMs) with preferences is an important problem in
natural language generation. A key challenge is that preferences are typically
provided at the *sequence level* while LM training and generation both occur at
the *token level*. There is, therefore, a *granularity mismatch* between the
preference and the LM training losses, which may complicate the learning
problem. In this paper, we address this issue by developing an alternate
training process, where we iterate between grounding the sequence-level
preference into token-level training guidance, and improving the LM with the
learned guidance. For guidance learning, we design a framework that extends the
pairwise-preference learning in imitation learning to both variable-length LM
generation and the utilization of the preference among multiple generations.
For LM training, based on the amount of supervised data, we present two
*minimalist* learning objectives that utilize the learned guidance. In
experiments, our method performs competitively on two distinct representative
LM tasks -- discrete-prompt generation and text summarization.
Related papers
- Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback.
Our focus is the code generation task, where the model produces code based on natural language instructions.
LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z) - Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-following LLM [31.25193238045053]
We introduce a novel method, namely GenCo, which leverages the strong generative power of large language models to assist in training a smaller language model.
In our method, an LLM plays an important role in the self-training loop of a smaller model in two important ways.
It helps crafting additional high-quality training pairs, by rewriting input texts conditioned on predicted labels.
arXiv Detail & Related papers (2023-04-24T07:35:38Z) - Meet in the Middle: A New Pre-training Paradigm [41.52858444519968]
Most language models (LMs) are trained and applied in an autoregressive left-to-right fashion.
We propose a new pre-training paradigm with techniques that jointly improve the training data efficiency.
We show the effectiveness of our pre-training paradigm with extensive experiments on both programming and natural language models.
arXiv Detail & Related papers (2023-03-13T17:17:11Z) - Selective Token Generation for Few-shot Natural Language Generation [19.015739016376532]
We develop a novel additive learning algorithm based on reinforcement learning (RL)
We show that the proposed selective token generation significantly outperforms the previous additive learning algorithms based on the PLMs.
arXiv Detail & Related papers (2022-09-17T00:48:52Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - AdaPrompt: Adaptive Model Training for Prompt-based NLP [77.12071707955889]
We propose AdaPrompt, adaptively retrieving external data for continual pretraining of PLMs.
Experimental results on five NLP benchmarks show that AdaPrompt can improve over standard PLMs in few-shot settings.
In zero-shot settings, our method outperforms standard prompt-based methods by up to 26.35% relative error reduction.
arXiv Detail & Related papers (2022-02-10T04:04:57Z) - Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM)
We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior.
Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.