Text Generation by Learning from Demonstrations
- URL: http://arxiv.org/abs/2009.07839v2
- Date: Wed, 3 Mar 2021 03:43:28 GMT
- Title: Text Generation by Learning from Demonstrations
- Authors: Richard Yuanzhe Pang, He He
- Abstract summary: Current approaches to text generation largely rely on autoregressive models and maximum likelihood estimation.
We propose GOLD: an easy-to-optimize algorithm that learns from expert demonstrations by importance weighting.
According to both automatic and human evaluation, models trained by GOLD outperform those trained by MLE and policy gradient.
- Score: 17.549815256968877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current approaches to text generation largely rely on autoregressive models
and maximum likelihood estimation. This paradigm leads to (i) diverse but
low-quality samples due to mismatched learning objective and evaluation metric
(likelihood vs. quality) and (ii) exposure bias due to mismatched history
distributions (gold vs. model-generated). To alleviate these problems, we frame
text generation as an offline reinforcement learning (RL) problem with expert
demonstrations (i.e., the reference), where the goal is to maximize quality
given model-generated histories. We propose GOLD (generation by off-policy
learning from demonstrations): an easy-to-optimize algorithm that learns from
the demonstrations by importance weighting. Intuitively, GOLD upweights
confident tokens and downweights unconfident ones in the reference during
training, avoiding optimization issues faced by prior RL approaches that rely
on online data collection. According to both automatic and human evaluation,
models trained by GOLD outperform those trained by MLE and policy gradient on
summarization, question generation, and machine translation. Further, our
models are less sensitive to decoding algorithms and alleviate exposure bias.
Related papers
- SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - GISTEmbed: Guided In-sample Selection of Training Negatives for Text
Embedding Fine-tuning [0.0]
GISTEmbed is a novel strategy that enhances in-batch negative selection during contrastive training through a guide model.
Benchmarked against the Massive Text Embedding Benchmark (MTEB), GISTEmbed showcases consistent performance improvements across various model sizes.
arXiv Detail & Related papers (2024-02-26T18:55:15Z) - An Information Theoretic Approach to Machine Unlearning [45.600917449314444]
Key challenge in unlearning is forgetting the necessary data in a timely manner, while preserving model performance.
In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten.
We derive a simple but principled zero-shot unlearning method based on the geometry of the model.
arXiv Detail & Related papers (2024-02-02T13:33:30Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models [39.37532848489779]
We propose Error Norm Truncation (ENT), a robust enhancement method to the standard training objective that truncates noisy data.
We show that ENT improves generation quality over standard training and previous soft and hard truncation methods.
arXiv Detail & Related papers (2023-10-02T01:30:27Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Semi-Supervised Learning Based on Reference Model for Low-resource TTS [32.731900584216724]
We propose a semi-supervised learning method for neural TTS in which labeled target data is limited.
Experimental results show that our proposed semi-supervised learning scheme with limited target data significantly improves the voice quality for test data to achieve naturalness and robustness in speech synthesis.
arXiv Detail & Related papers (2022-10-25T07:48:07Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.