Understanding the Effects of RLHF on LLM Generalisation and Diversity
- URL: http://arxiv.org/abs/2310.06452v3
- Date: Mon, 19 Feb 2024 14:39:07 GMT
- Title: Understanding the Effects of RLHF on LLM Generalisation and Diversity
- Authors: Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena
Luketina, Eric Hambro, Edward Grefenstette, Roberta Raileanu
- Abstract summary: Large language models (LLMs) fine-tuned with reinforcement learning from human feedback (RLHF) have been used in some of the most widely deployed AI models to date.
We present an analysis of how each stage of the process affects two key properties: out-of-distribution (OOD) generalisation and output diversity.
- Score: 26.56388427640671
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models (LLMs) fine-tuned with reinforcement learning from
human feedback (RLHF) have been used in some of the most widely deployed AI
models to date, such as OpenAI's ChatGPT or Anthropic's Claude. While there has
been significant work developing these methods, our understanding of the
benefits and downsides of each stage in RLHF is still limited. To fill this
gap, we present an extensive analysis of how each stage of the process (i.e.
supervised fine-tuning (SFT), reward modelling, and RLHF) affects two key
properties: out-of-distribution (OOD) generalisation and output diversity. OOD
generalisation is crucial given the wide range of real-world scenarios in which
these models are being used, while output diversity refers to the model's
ability to generate varied outputs and is important for a variety of use cases.
We perform our analysis across two base models on both summarisation and
instruction following tasks, the latter being highly relevant for current LLM
use cases. We find that RLHF generalises better than SFT to new inputs,
particularly as the distribution shift between train and test becomes larger.
However, RLHF significantly reduces output diversity compared to SFT across a
variety of measures, implying a tradeoff in current LLM fine-tuning methods
between generalisation and diversity. Our results provide guidance on which
fine-tuning method should be used depending on the application, and show that
more research is needed to improve the tradeoff between generalisation and
diversity.
Related papers
- Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks.
To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z) - Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture [9.244633039170186]
We propose a plug-and-play Entropy-based Scaling Factor (ESF) and a Distribution-Specific (DS) decoder.
ESF adjusts the attention weight pattern of the model towards familiar ones discovered during training when solving VRPs of varying sizes.
DS decoder explicitly models VRPs of multiple training distribution patterns through multiple auxiliary light decoders, expanding the model representation space.
arXiv Detail & Related papers (2024-06-10T09:03:17Z) - RLSF: Reinforcement Learning via Symbolic Feedback [11.407319705797242]
We propose a new fine-tuning paradigm we refer to as Reinforcement Learning via proofs Feedback (RLSF)
In RLSF, the LLM being fine-tuned is considered an RL agent, while the environment is allowed access to reasoning or domain knowledge tools.
We show that our RLSF-based fine-tuning of LLMs outperforms traditional approaches on five different applications.
arXiv Detail & Related papers (2024-05-26T18:49:59Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Unveiling the Generalization Power of Fine-Tuned Large Language Models [81.70754292058258]
We investigate whether fine-tuning affects the intrinsic generalization ability intrinsic to Large Language Models (LLMs)
Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks.
We observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability.
arXiv Detail & Related papers (2024-03-14T08:18:59Z) - Teaching Large Language Models to Reason with Reinforcement Learning [38.17625148525193]
Reinforcement Learning from Human Feedback (textbfRLHF) has emerged as a dominant approach for aligning LLM outputs with human preferences.
Inspired by the success of RLHF, we study the performance of multiple algorithms that learn from feedback.
arXiv Detail & Related papers (2024-03-07T16:36:29Z) - Generalizing Reward Modeling for Out-of-Distribution Preference Learning [3.9160947065896803]
Preference learning with large language models (LLMs) aims to align the LLMs' generations with human preferences.
Due to the difficulty of obtaining human feedback, discretely training reward models for every encountered distribution is challenging.
This work addresses OOD PL by optimizing a general reward model through a meta-learning approach.
arXiv Detail & Related papers (2024-02-22T18:20:33Z) - Dive into the Chasm: Probing the Gap between In- and Cross-Topic
Generalization [66.4659448305396]
This study analyzes various LMs with three probing-based experiments to shed light on the reasons behind the In- vs. Cross-Topic generalization gap.
We demonstrate, for the first time, that generalization gaps and the robustness of the embedding space vary significantly across LMs.
arXiv Detail & Related papers (2024-02-02T12:59:27Z) - Mitigating the Alignment Tax of RLHF [76.4300447532456]
aligning LLMs under Reinforcement Learning with Human Feedback can lead to forgetting pretrained abilities, also known as the alignment tax.
We propose model averaging to maximize alignment performance while incurring minimal alignment tax.
We validate HMA's performance across a range of RLHF algorithms over OpenLLaMA-3B and further extend our findings to Mistral-7B.
arXiv Detail & Related papers (2023-09-12T14:16:54Z) - David helps Goliath: Inference-Time Collaboration Between Small
Specialized and Large General Diffusion LMs [49.822063966687175]
Diffusion-based language models are emerging as a promising alternative to autoregressive LMs.
We propose methods to scale a recently proposed diffusion model SSD-LM from 0.4B to 13B parameters.
We show that SSD-2 facilitates novel ensembles with 100x smaller models that can be customized and deployed by individual users.
arXiv Detail & Related papers (2023-05-24T06:22:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.