Preference-grounded Token-level Guidance for Language Model Fine-tuning
- URL: http://arxiv.org/abs/2306.00398v3
- Date: Wed, 08 Jan 2025 06:35:07 GMT
- Title: Preference-grounded Token-level Guidance for Language Model Fine-tuning
- Authors: Shentao Yang, Shujian Zhang, Congying Xia, Yihao Feng, Caiming Xiong, Mingyuan Zhou,
- Abstract summary: Aligning language models with preferences is an important problem in natural language generation.
For LM training, based on the amount of supervised data, we present two minimalist learning objectives that utilize the learned guidance.
In experiments, our method performs competitively on two distinct representative LM tasks.
- Score: 99.93045967478764
- License:
- Abstract: Aligning language models (LMs) with preferences is an important problem in natural language generation. A key challenge is that preferences are typically provided at the sequence level while LM training and generation both occur at the token level. There is, therefore, a granularity mismatch between the preference and the LM training losses, which may complicate the learning problem. In this paper, we address this issue by developing an alternate training process, where we iterate between grounding the sequence-level preference into token-level training guidance, and improving the LM with the learned guidance. For guidance learning, we design a framework that extends the pairwise-preference learning in imitation learning to both variable-length LM generation and the utilization of the preference among multiple generations. For LM training, based on the amount of supervised data, we present two minimalist learning objectives that utilize the learned guidance. In experiments, our method performs competitively on two distinct representative LM tasks -- discrete-prompt generation and text summarization.
Related papers
- TinyHelen's First Curriculum: Training and Evaluating Tiny Language Models in a Simpler Language Environment [30.93798042712827]
Training language models (LMs) and their application agents is increasingly costly due to large datasets and models.
We propose a pipeline to refine text data by eliminating noise, minimizing vocabulary, and maintaining genre-specific patterns.
Our experiments show that leaner pre-training boosts LM learning efficiency.
arXiv Detail & Related papers (2024-12-31T16:08:15Z) - Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning [57.74233319453229]
Large language models (LLMs) have emerged as a groundbreaking technology and their unparalleled text generation capabilities have sparked interest in their application to the fundamental sentence representation learning task.
We propose MultiCSR, a multi-level contrastive sentence representation learning framework that decomposes the process of prompting LLMs to generate a corpus.
Our experiments reveal that MultiCSR enables a less advanced LLM to surpass the performance of ChatGPT, while applying it to ChatGPT achieves better state-of-the-art results.
arXiv Detail & Related papers (2023-10-17T03:21:43Z) - LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback.
Our focus is the code generation task, where the model produces code based on natural language instructions.
LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z) - Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-following LLM [31.25193238045053]
We introduce a novel method, namely GenCo, which leverages the strong generative power of large language models to assist in training a smaller language model.
In our method, an LLM plays an important role in the self-training loop of a smaller model in two important ways.
It helps crafting additional high-quality training pairs, by rewriting input texts conditioned on predicted labels.
arXiv Detail & Related papers (2023-04-24T07:35:38Z) - Meet in the Middle: A New Pre-training Paradigm [41.52858444519968]
Most language models (LMs) are trained and applied in an autoregressive left-to-right fashion.
We propose a new pre-training paradigm with techniques that jointly improve the training data efficiency.
We show the effectiveness of our pre-training paradigm with extensive experiments on both programming and natural language models.
arXiv Detail & Related papers (2023-03-13T17:17:11Z) - Selective Token Generation for Few-shot Natural Language Generation [19.015739016376532]
We develop a novel additive learning algorithm based on reinforcement learning (RL)
We show that the proposed selective token generation significantly outperforms the previous additive learning algorithms based on the PLMs.
arXiv Detail & Related papers (2022-09-17T00:48:52Z) - A Multi-level Supervised Contrastive Learning Framework for Low-Resource
Natural Language Inference [54.678516076366506]
Natural Language Inference (NLI) is a growingly essential task in natural language understanding.
Here we propose a multi-level supervised contrastive learning framework named MultiSCL for low-resource natural language inference.
arXiv Detail & Related papers (2022-05-31T05:54:18Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.