RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
- URL: http://arxiv.org/abs/2304.06767v4
- Date: Fri, 1 Dec 2023 14:28:06 GMT
- Title: RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
- Authors: Hanze Dong, Wei Xiong, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui
Pan, Shizhe Diao, Jipeng Zhang, Kashun Shum, Tong Zhang
- Abstract summary: Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data.
We introduce a new framework, Reward rAnked FineTuning, designed to align generative models effectively.
- Score: 32.752633250862694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative foundation models are susceptible to implicit biases that can
arise from extensive unsupervised training data. Such biases can produce
suboptimal samples, skewed outcomes, and unfairness, with potentially serious
consequences. Consequently, aligning these models with human ethics and
preferences is an essential step toward ensuring their responsible and
effective deployment in real-world applications. Prior research has primarily
employed Reinforcement Learning from Human Feedback (RLHF) to address this
problem, where generative models are fine-tuned with RL algorithms guided by a
human-feedback-informed reward model. However, the inefficiencies and
instabilities associated with RL algorithms frequently present substantial
obstacles to the successful alignment, necessitating the development of a more
robust and streamlined approach. To this end, we introduce a new framework,
Reward rAnked FineTuning (RAFT), designed to align generative models
effectively. Utilizing a reward model and a sufficient number of samples, our
approach selects the high-quality samples, discarding those that exhibit
undesired behavior, and subsequently enhancing the model by fine-tuning on
these filtered samples. Our studies show that RAFT can effectively improve the
model performance in both reward learning and other automated metrics in both
large language models and diffusion models.
Related papers
- Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models [54.132297393662654]
We introduce a hybrid method that fine-tunes cutting-edge diffusion models by optimizing reward models through RL.
We demonstrate the capability of our approach to outperform the best designs in offline data, leveraging the extrapolation capabilities of reward models.
arXiv Detail & Related papers (2024-05-30T03:57:29Z) - Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration [15.463313629574111]
This paper investigates how to achieve sample-efficient exploration in continuous control tasks.
We introduce an RL algorithm that incorporates a predictive model and off-policy learning elements.
We derive an intrinsic reward without incurring parameters overhead.
arXiv Detail & Related papers (2024-03-31T11:39:11Z) - Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized
Control [54.132297393662654]
Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins.
While diffusion models are trained to represent the distribution in the training dataset, we often are more concerned with other properties, such as the aesthetic quality of the generated images.
We present theoretical and empirical evidence that demonstrates our framework is capable of efficiently generating diverse samples with high genuine rewards.
arXiv Detail & Related papers (2024-02-23T08:54:42Z) - Secrets of RLHF in Large Language Models Part II: Reward Modeling [134.97964938009588]
We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset.
We also introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses.
arXiv Detail & Related papers (2024-01-11T17:56:59Z) - Matching Pairs: Attributing Fine-Tuned Models to their Pre-Trained Large
Language Models [11.57282859281814]
We consider different knowledge levels and attribution strategies, and find that we can correctly trace back 8 out of the 10 fine tuned models with our best method.
arXiv Detail & Related papers (2023-06-15T17:42:48Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - Your Autoregressive Generative Model Can be Better If You Treat It as an
Energy-Based One [83.5162421521224]
We propose a unique method termed E-ARM for training autoregressive generative models.
E-ARM takes advantage of a well-designed energy-based learning objective.
We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
arXiv Detail & Related papers (2022-06-26T10:58:41Z) - Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow [14.422129911404472]
Bellman aims to fill this gap and introduces the first thoroughly designed and tested model-based RL toolbox.
Our modular approach enables to combine a wide range of environment models with generic model-based agent classes that recover state-of-the-art algorithms.
arXiv Detail & Related papers (2021-03-26T11:32:27Z) - On the model-based stochastic value gradient for continuous
reinforcement learning [50.085645237597056]
We show that simple model-based agents can outperform state-of-the-art model-free agents in terms of both sample-efficiency and final reward.
Our findings suggest that model-based policy evaluation deserves closer attention.
arXiv Detail & Related papers (2020-08-28T17:58:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.