Understanding the Effects of RLHF on LLM Generalisation and Diversity
- URL: http://arxiv.org/abs/2310.06452v3
- Date: Mon, 19 Feb 2024 14:39:07 GMT
- Title: Understanding the Effects of RLHF on LLM Generalisation and Diversity
- Authors: Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena
Luketina, Eric Hambro, Edward Grefenstette, Roberta Raileanu
- Abstract summary: Large language models (LLMs) fine-tuned with reinforcement learning from human feedback (RLHF) have been used in some of the most widely deployed AI models to date.
We present an analysis of how each stage of the process affects two key properties: out-of-distribution (OOD) generalisation and output diversity.
- Score: 26.56388427640671
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models (LLMs) fine-tuned with reinforcement learning from
human feedback (RLHF) have been used in some of the most widely deployed AI
models to date, such as OpenAI's ChatGPT or Anthropic's Claude. While there has
been significant work developing these methods, our understanding of the
benefits and downsides of each stage in RLHF is still limited. To fill this
gap, we present an extensive analysis of how each stage of the process (i.e.
supervised fine-tuning (SFT), reward modelling, and RLHF) affects two key
properties: out-of-distribution (OOD) generalisation and output diversity. OOD
generalisation is crucial given the wide range of real-world scenarios in which
these models are being used, while output diversity refers to the model's
ability to generate varied outputs and is important for a variety of use cases.
We perform our analysis across two base models on both summarisation and
instruction following tasks, the latter being highly relevant for current LLM
use cases. We find that RLHF generalises better than SFT to new inputs,
particularly as the distribution shift between train and test becomes larger.
However, RLHF significantly reduces output diversity compared to SFT across a
variety of measures, implying a tradeoff in current LLM fine-tuning methods
between generalisation and diversity. Our results provide guidance on which
fine-tuning method should be used depending on the application, and show that
more research is needed to improve the tradeoff between generalisation and
diversity.
Related papers
- Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits [59.30310692855397]
We propose a unified framework for the RLHF pipeline from the view of contextual bandits.
We decompose the RLHF process into two distinct stages: (post-)training and deployment.
We then develop novel algorithms for each stage, demonstrating significant improvements in both statistical and computational efficiency.
arXiv Detail & Related papers (2025-02-11T02:36:01Z) - SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training [127.47044960572659]
Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models.
This paper studies the difference between SFT and RL on generalization and memorization.
We show that RL, especially when trained with an outcome-based reward, generalizes across both rule-based textual and visual variants.
arXiv Detail & Related papers (2025-01-28T18:59:44Z) - Bridging the Gap for Test-Time Multimodal Sentiment Analysis [7.871669754963032]
Multimodal sentiment analysis (MSA) is an emerging research topic that aims to understand and recognize human sentiment or emotions through multiple modalities.
In this paper, we propose two strategies: Contrastive Adaptation and Stable Pseudo-label generation (CASP) for test-time adaptation for MSA.
arXiv Detail & Related papers (2024-12-10T02:26:33Z) - Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks.
To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z) - Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture [9.244633039170186]
We propose a plug-and-play Entropy-based Scaling Factor (ESF) and a Distribution-Specific (DS) decoder.
ESF adjusts the attention weight pattern of the model towards familiar ones discovered during training when solving VRPs of varying sizes.
DS decoder explicitly models VRPs of multiple training distribution patterns through multiple auxiliary light decoders, expanding the model representation space.
arXiv Detail & Related papers (2024-06-10T09:03:17Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Unveiling the Generalization Power of Fine-Tuned Large Language Models [81.70754292058258]
We investigate whether fine-tuning affects the intrinsic generalization ability intrinsic to Large Language Models (LLMs)
Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks.
We observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability.
arXiv Detail & Related papers (2024-03-14T08:18:59Z) - Teaching Large Language Models to Reason with Reinforcement Learning [38.17625148525193]
Reinforcement Learning from Human Feedback (textbfRLHF) has emerged as a dominant approach for aligning LLM outputs with human preferences.
Inspired by the success of RLHF, we study the performance of multiple algorithms that learn from feedback.
arXiv Detail & Related papers (2024-03-07T16:36:29Z) - Dive into the Chasm: Probing the Gap between In- and Cross-Topic
Generalization [66.4659448305396]
This study analyzes various LMs with three probing-based experiments to shed light on the reasons behind the In- vs. Cross-Topic generalization gap.
We demonstrate, for the first time, that generalization gaps and the robustness of the embedding space vary significantly across LMs.
arXiv Detail & Related papers (2024-02-02T12:59:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.