Text Generation by Learning from Demonstrations
- URL: http://arxiv.org/abs/2009.07839v2
- Date: Wed, 3 Mar 2021 03:43:28 GMT
- Title: Text Generation by Learning from Demonstrations
- Authors: Richard Yuanzhe Pang, He He
- Abstract summary: Current approaches to text generation largely rely on autoregressive models and maximum likelihood estimation.
We propose GOLD: an easy-to-optimize algorithm that learns from expert demonstrations by importance weighting.
According to both automatic and human evaluation, models trained by GOLD outperform those trained by MLE and policy gradient.
- Score: 17.549815256968877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current approaches to text generation largely rely on autoregressive models
and maximum likelihood estimation. This paradigm leads to (i) diverse but
low-quality samples due to mismatched learning objective and evaluation metric
(likelihood vs. quality) and (ii) exposure bias due to mismatched history
distributions (gold vs. model-generated). To alleviate these problems, we frame
text generation as an offline reinforcement learning (RL) problem with expert
demonstrations (i.e., the reference), where the goal is to maximize quality
given model-generated histories. We propose GOLD (generation by off-policy
learning from demonstrations): an easy-to-optimize algorithm that learns from
the demonstrations by importance weighting. Intuitively, GOLD upweights
confident tokens and downweights unconfident ones in the reference during
training, avoiding optimization issues faced by prior RL approaches that rely
on online data collection. According to both automatic and human evaluation,
models trained by GOLD outperform those trained by MLE and policy gradient on
summarization, question generation, and machine translation. Further, our
models are less sensitive to decoding algorithms and alleviate exposure bias.
Related papers
- SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation [55.61004653386632]
Large Language Models (LLMs) often produce hallucinations, i.e., information that is unfaithful or not grounded in the input context.
This paper introduces a novel self-supervised method for generating a training set of unfaithful samples.
We then refine the model using a training process that encourages the generation of grounded outputs over unfaithful ones.
arXiv Detail & Related papers (2025-02-19T12:31:58Z) - ReLearn: Unlearning via Learning for Large Language Models [64.2802606302194]
We propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning.
This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation.
Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output.
arXiv Detail & Related papers (2025-02-16T16:31:00Z) - Boosting Alignment for Post-Unlearning Text-to-Image Generative Models [55.82190434534429]
Large-scale generative models have shown impressive image-generation capabilities, propelled by massive data.
This often inadvertently leads to the generation of harmful or inappropriate content and raises copyright concerns.
We propose a framework that seeks an optimal model update at each unlearning iteration, ensuring monotonic improvement on both objectives.
arXiv Detail & Related papers (2024-12-09T21:36:10Z) - Self-Improvement in Language Models: The Sharpening Mechanism [70.9248553790022]
We offer a new perspective on the capabilities of self-improvement through a lens we refer to as sharpening.
Motivated by the observation that language models are often better at verifying response quality than they are at generating correct responses, we formalize self-improvement as using the model itself as a verifier during post-training.
We analyze two natural families of self-improvement algorithms based on SFT and RLHF.
arXiv Detail & Related papers (2024-12-02T20:24:17Z) - Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review [50.78587571704713]
Learn-Focus-Review (LFR) is a dynamic training approach that adapts to the model's learning progress.
LFR tracks the model's learning performance across data blocks (sequences of tokens) and prioritizes revisiting challenging regions of the dataset.
Compared to baseline models trained on the full datasets, LFR consistently achieved lower perplexity and higher accuracy.
arXiv Detail & Related papers (2024-09-10T00:59:18Z) - GISTEmbed: Guided In-sample Selection of Training Negatives for Text
Embedding Fine-tuning [0.0]
GISTEmbed is a novel strategy that enhances in-batch negative selection during contrastive training through a guide model.
Benchmarked against the Massive Text Embedding Benchmark (MTEB), GISTEmbed showcases consistent performance improvements across various model sizes.
arXiv Detail & Related papers (2024-02-26T18:55:15Z) - Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models [39.37532848489779]
We propose Error Norm Truncation (ENT), a robust enhancement method to the standard training objective that truncates noisy data.
We show that ENT improves generation quality over standard training and previous soft and hard truncation methods.
arXiv Detail & Related papers (2023-10-02T01:30:27Z) - Semi-Supervised Learning Based on Reference Model for Low-resource TTS [32.731900584216724]
We propose a semi-supervised learning method for neural TTS in which labeled target data is limited.
Experimental results show that our proposed semi-supervised learning scheme with limited target data significantly improves the voice quality for test data to achieve naturalness and robustness in speech synthesis.
arXiv Detail & Related papers (2022-10-25T07:48:07Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.