RL4F: Generating Natural Language Feedback with Reinforcement Learning
for Repairing Model Outputs
- URL: http://arxiv.org/abs/2305.08844v2
- Date: Tue, 11 Jul 2023 18:29:12 GMT
- Title: RL4F: Generating Natural Language Feedback with Reinforcement Learning
for Repairing Model Outputs
- Authors: Afra Feyza Aky\"urek, Ekin Aky\"urek, Aman Madaan, Ashwin Kalyan,
Peter Clark, Derry Wijaya, Niket Tandon
- Abstract summary: Previous work proposed providing language models with natural language feedback to guide them in repairing their outputs.
We introduce RL4F, a multi-agent collaborative framework where critique generator is trained to maximize end-task performance of GPT-3.
We show relative improvements up to 10% in multiple text similarity metrics over other learned, retrieval-augmented or prompting-based critique generators.
- Score: 27.777809444120827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite their unprecedented success, even the largest language models make
mistakes. Similar to how humans learn and improve using feedback, previous work
proposed providing language models with natural language feedback to guide them
in repairing their outputs. Because human-generated critiques are expensive to
obtain, researchers have devised learned critique generators in lieu of human
critics while assuming one can train downstream models to utilize generated
feedback. However, this approach does not apply to black-box or limited access
models such as ChatGPT, as they cannot be fine-tuned. Moreover, in the era of
large general-purpose language agents, fine-tuning is neither computationally
nor spatially efficient as it results in multiple copies of the network. In
this work, we introduce RL4F (Reinforcement Learning for Feedback), a
multi-agent collaborative framework where the critique generator is trained to
maximize end-task performance of GPT-3, a fixed model more than 200 times its
size. RL4F produces critiques that help GPT-3 revise its outputs. We study
three datasets for action planning, summarization and alphabetization and show
relative improvements up to 10% in multiple text similarity metrics over other
learned, retrieval-augmented or prompting-based critique generators.
Related papers
- Self-Evolved Reward Learning for LLMs [45.6910747154447]
Reinforcement Learning from Human Feedback (RLHF) is a crucial technique for aligning language models with human preferences.
We propose Self-Evolved Reward Learning (SER), a novel approach where the RM generates additional training data to iteratively improve itself.
Our results demonstrate that even with limited human-annotated data, learning from self-feedback can robustly enhance RM performance.
arXiv Detail & Related papers (2024-11-01T07:29:03Z) - LLMs are Superior Feedback Providers: Bootstrapping Reasoning for Lie Detection with Self-Generated Feedback [33.14770105185958]
Large Language Models (LLMs) excel at generating human-like dialogues and comprehending text.
We propose a bootstrapping framework that leverages self-generated feedback to enhance LLM reasoning capabilities for lie detection.
We investigate the application of the proposed framework for detecting betrayal and deception in Diplomacy games, and compare it with feedback from professional human players.
arXiv Detail & Related papers (2024-08-25T18:47:55Z) - CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation [87.44350003888646]
Eval-Instruct can acquire pointwise grading critiques with pseudo references and revise these critiques via multi-path prompting.
CritiqueLLM is empirically shown to outperform ChatGPT and all the open-source baselines.
arXiv Detail & Related papers (2023-11-30T16:52:42Z) - Constructive Large Language Models Alignment with Diverse Feedback [76.9578950893839]
We introduce Constructive and Diverse Feedback (CDF) as a novel method to enhance large language models alignment.
We exploit critique feedback for easy problems, refinement feedback for medium problems, and preference feedback for hard problems.
By training our model with this diversified feedback, we achieve enhanced alignment performance while using less training data.
arXiv Detail & Related papers (2023-10-10T09:20:14Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - Training Language Models with Language Feedback at Scale [50.70091340506957]
We introduce learning from Language Feedback (ILF), a new approach that utilizes more informative language feedback.
ILF consists of three steps that are applied iteratively: first, conditioning the language model on the input, an initial LM output, and feedback to generate refinements.
We show theoretically that ILF can be viewed as Bayesian Inference, similar to Reinforcement Learning from human feedback.
arXiv Detail & Related papers (2023-03-28T17:04:15Z) - Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity.
We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model.
By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z) - Self-critiquing models for assisting human evaluators [11.1006983438712]
We fine-tune large language models to write natural language critiques (natural language critical comments) using behavioral cloning.
On a topic-based summarization task, critiques written by our models help humans find flaws in summaries that they would have otherwise missed.
Larger models write more helpful critiques, and on most tasks, are better at self-critiquing, despite having harder-to-critique outputs.
arXiv Detail & Related papers (2022-06-12T17:40:53Z) - Training Language Models with Natural Language Feedback [51.36137482891037]
We learn from language feedback on model outputs using a three-step learning algorithm.
In synthetic experiments, we first evaluate whether language models accurately incorporate feedback to produce refinements.
Using only 100 samples of human-written feedback, our learning algorithm finetunes a GPT-3 model to roughly human-level summarization.
arXiv Detail & Related papers (2022-04-29T15:06:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.