On Diversified Preferences of Large Language Model Alignment
- URL: http://arxiv.org/abs/2312.07401v5
- Date: Sat, 05 Oct 2024 07:45:45 GMT
- Title: On Diversified Preferences of Large Language Model Alignment
- Authors: Dun Zeng, Yong Dai, Pengyu Cheng, Longyue Wang, Tianhao Hu, Wanshun Chen, Nan Du, Zenglin Xu,
- Abstract summary: This paper presents the first quantitative analysis of the experimental scaling law for reward models with varying sizes.
Our analysis reveals that the impact of diversified human preferences depends on both model size and data size.
Larger models with sufficient capacity mitigate the negative effects of diverse preferences, while smaller models struggle to accommodate them.
- Score: 51.26149027399505
- License:
- Abstract: Aligning large language models (LLMs) with human preferences has been recognized as the key to improving LLMs' interaction quality. However, in this pluralistic world, human preferences can be diversified due to annotators' different tastes, which hinders the effectiveness of LLM alignment methods. This paper presents the first quantitative analysis of the experimental scaling law for reward models with varying sizes, from 1.3 billion to 7 billion parameters, trained with human feedback exhibiting diverse preferences. Our analysis reveals that the impact of diversified human preferences depends on both model size and data size. Larger models with sufficient capacity mitigate the negative effects of diverse preferences, while smaller models struggle to accommodate them. To mitigate the impact of diverse preferences, we introduce a new metric, Expected Calibration Error (ECE), to evaluate RMs and show their obvious positive correlation with the alignment performance of LLMs. Furthermore, we propose a Multi-Objective Reward learning method (MORE) to enhance the calibration performance of RMs on shared preferences. Through experiments on four models and five human preference datasets, we find the calibration error can be adopted as a key metric for evaluating RMs and MORE can obtain superior alignment performance.
Related papers
- Optimizing LLMs with Direct Preferences: A Data Efficiency Perspective [4.548047308860141]
This study investigates the impact of different type of preference data on model performance.
It aims to reduce their dependency on extensive amounts of preference data, which is expensive to collect.
arXiv Detail & Related papers (2024-10-22T00:11:41Z) - Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness [27.43137305486112]
We propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss.
The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods to achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-09-26T12:37:26Z) - BAPO: Base-Anchored Preference Optimization for Overcoming Forgetting in Large Language Models Personalization [26.526171463511332]
This paper examines the impact of personalized preference optimization on Large Language Models (LLMs)
BAPO effectively adapts to diverse user preferences while minimally affecting global knowledge or general alignment.
arXiv Detail & Related papers (2024-06-30T13:30:04Z) - A Survey on Human Preference Learning for Large Language Models [81.41868485811625]
The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning.
This survey covers the sources and formats of preference feedback, the modeling and usage of preference signals, as well as the evaluation of the aligned LLMs.
arXiv Detail & Related papers (2024-06-17T03:52:51Z) - Multi-Reference Preference Optimization for Large Language Models [56.84730239046117]
We introduce a novel closed-form formulation for direct preference optimization using multiple reference models.
The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models.
Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance.
arXiv Detail & Related papers (2024-05-26T00:29:04Z) - Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency.
Results show that consistency-based calibration methods outperform existing post-hoc approaches.
We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z) - MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with
Diverse Human Preferences [101.57443597426374]
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.
We learn a mixture of preference distributions via an expectation-maximization algorithm to better represent diverse human preferences.
Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms.
arXiv Detail & Related papers (2024-02-14T03:56:27Z) - Interactive Hyperparameter Optimization in Multi-Objective Problems via
Preference Learning [65.51668094117802]
We propose a human-centered interactive HPO approach tailored towards multi-objective machine learning (ML)
Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator.
arXiv Detail & Related papers (2023-09-07T09:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.