Learning to Generate Better Than Your LLM
- URL: http://arxiv.org/abs/2306.11816v2
- Date: Mon, 13 Nov 2023 18:51:42 GMT
- Title: Learning to Generate Better Than Your LLM
- Authors: Jonathan D. Chang, Kiante Brantley, Rajkumar Ramamurthy, Dipendra
Misra, Wen Sun
- Abstract summary: Reinforcement learning has emerged as a powerful paradigm for fine-tuning Large Language Models.
We extend RL algorithms to allow them to interact with a dynamic black-box guide LLM.
We show that our RL algorithms achieve higher performance than supervised learning.
- Score: 16.74454360961681
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) has emerged as a powerful paradigm for
fine-tuning Large Language Models (LLMs) for text generation. In particular,
recent LLMs such as ChatGPT and GPT-4 can engage in fluent conversations with
users after finetuning with RL. Capitalizing on key properties of text
generation, we seek to investigate RL algorithms beyond general purpose
algorithms like Proximal Policy Optimization (PPO). In particular, we extend RL
algorithms to allow them to interact with a dynamic black-box guide LLM and
propose RL with guided feedback (RLGF), a suite of RL algorithms for LLM
fine-tuning. We provide two ways for the guide LLM to interact with the LLM to
be optimized for maximizing rewards. The guide LLM can generate text which
serves as additional starting states for the RL optimization procedure. The
guide LLM can also be used to complete the partial sentences generated by the
LLM that is being optimized, treating the guide LLM as an expert to imitate and
surpass eventually. We experiment on the IMDB positive sentiment, CommonGen,
and TL;DR summarization tasks. We show that our RL algorithms achieve higher
performance than supervised learning (SL) and the RL baseline PPO,
demonstrating the benefit of interaction with the guide LLM. On both CommonGen
and TL;DR, we not only outperform our SL baselines but also improve upon PPO
across a variety of metrics beyond the one we optimized for. Our code can be
found at https://github.com/Cornell-RL/tril.
Related papers
- AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models [94.82766517752418]
We propose AlphaPruning, which uses shape metrics to allocate layerwise sparsity ratios in a more theoretically principled manner.
Our results show that AlphaPruning prunes LLaMA-7B to 80% sparsity while maintaining reasonable perplexity, marking a first in the literature on LLMs.
arXiv Detail & Related papers (2024-10-14T03:35:11Z) - Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher [11.136112399898481]
How can small-scale large language models (LLMs) efficiently utilize the supervision of LLMs to improve their generative quality?
We develop an algorithm to effectively aggregate the small-scale LLM and LLM predictions on initial tokens.
We demonstrate that our method provides a consistent improvement over conventional decoding strategies.
arXiv Detail & Related papers (2024-06-26T01:16:12Z) - One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)
We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs [45.44796295841526]
Large Language Models (LLMs) exhibit impressive zero/few-shot inference and generation quality for high-resource languages (HRLs)
A few of them have been trained on low-resource languages (LRLs) and give decent performance.
We show that LRLs are at a pricing disadvantage, because the well-known LLMs produce more tokens for LRLs than HRLs.
arXiv Detail & Related papers (2024-03-08T16:37:36Z) - Teaching Large Language Models to Reason with Reinforcement Learning [38.17625148525193]
Reinforcement Learning from Human Feedback (textbfRLHF) has emerged as a dominant approach for aligning LLM outputs with human preferences.
Inspired by the success of RLHF, we study the performance of multiple algorithms that learn from feedback.
arXiv Detail & Related papers (2024-03-07T16:36:29Z) - ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality.
We propose LLMRefine, an inference time optimization method to refine LLM's output.
We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization.
LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z) - LLMRec: Benchmarking Large Language Models on Recommendation Task [54.48899723591296]
The application of Large Language Models (LLMs) in the recommendation domain has not been thoroughly investigated.
We benchmark several popular off-the-shelf LLMs on five recommendation tasks, including rating prediction, sequential recommendation, direct recommendation, explanation generation, and review summarization.
The benchmark results indicate that LLMs displayed only moderate proficiency in accuracy-based tasks such as sequential and direct recommendation.
arXiv Detail & Related papers (2023-08-23T16:32:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.