Related papers: A Survey on Human Preference Learning for Large Language Models

A Survey on Human Preference Learning for Large Language Models

URL: http://arxiv.org/abs/2406.11191v2
Date: Tue, 18 Jun 2024 08:18:33 GMT
Title: A Survey on Human Preference Learning for Large Language Models
Authors: Ruili Jiang, Kehai Chen, Xuefeng Bai, Zhixuan He, Juntao Li, Muyun Yang, Tiejun Zhao, Liqiang Nie, Min Zhang,
Abstract summary: The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning. This survey covers the sources and formats of preference feedback, the modeling and usage of preference signals, as well as the evaluation of the aligned LLMs.
Score: 81.41868485811625
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning, enhancing LLMs with excellent applicability and effectiveness in a wide range of contexts. Despite the numerous related studies conducted, a perspective on how human preferences are introduced into LLMs remains limited, which may prevent a deeper comprehension of the relationships between human preferences and LLMs as well as the realization of their limitations. In this survey, we review the progress in exploring human preference learning for LLMs from a preference-centered perspective, covering the sources and formats of preference feedback, the modeling and usage of preference signals, as well as the evaluation of the aligned LLMs. We first categorize the human feedback according to data sources and formats. We then summarize techniques for human preferences modeling and compare the advantages and disadvantages of different schools of models. Moreover, we present various preference usage methods sorted by the objectives to utilize human preference signals. Finally, we summarize some prevailing approaches to evaluate LLMs in terms of alignment with human intentions and discuss our outlooks on the human intention alignment for LLMs.

Related papers

A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models [52.91062958231693]
We present an analysis of works on personalized alignment and modeling for large language models. We introduce a taxonomy of preference alignment techniques, including training time, inference time, and additionally, user-modeling based methods.
arXiv Detail & Related papers (2025-04-09T17:39:58Z)
IPO: Your Language Model is Secretly a Preference Classifier [1.8921784053120494]
Reinforcement learning from human feedback (RLHF) has emerged as the primary method for aligning large language models with human preferences. We propose Implicit Preference Optimization (IPO), an alternative approach that leverages generative language models as preference classifiers. Our findings demonstrate that models trained through IPO achieve performance comparable to those utilizing state-of-the-art reward models for obtaining preferences.
arXiv Detail & Related papers (2025-02-22T10:59:11Z)
Personalization of Large Language Models: A Survey [131.00650432814268]
Personalization of Large Language Models (LLMs) has recently become increasingly important with a wide range of applications. Most existing works on personalized LLMs have focused either entirely on (a) personalized text generation or (b) leveraging LLMs for personalization-related downstream applications, such as recommendation systems. We introduce a taxonomy for personalized LLM usage and summarizing the key differences and challenges.
arXiv Detail & Related papers (2024-10-29T04:01:11Z)
MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time [50.41806216615488]
Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora. To make LLMs more usable, aligning them with human preferences is essential. We propose an effective method, textbf MetaAlign, which aims to help LLMs dynamically align with various explicit or implicit preferences specified at inference time.
arXiv Detail & Related papers (2024-10-18T05:31:13Z)
Uncovering Factor Level Preferences to Improve Human-Model Alignment [58.50191593880829]
We introduce PROFILE, a framework that uncovers and quantifies the influence of specific factors driving preferences. ProFILE's factor level analysis explains the 'why' behind human-model alignment and misalignment. We demonstrate how leveraging factor level insights, including addressing misaligned factors, can improve alignment with human preferences.
arXiv Detail & Related papers (2024-10-09T15:02:34Z)
Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments [41.25558612970942]
We show that large language models (LLMs) exhibit preference biases and worrying sensitivity to prompt designs. Motivated by this phenomenon, we propose an automatic Zero-shot Evaluation-oriented Prompt Optimization framework, ZEPO.
arXiv Detail & Related papers (2024-06-17T09:48:53Z)
Bayesian Statistical Modeling with Predictors from LLMs [5.5711773076846365]
State of the art large language models (LLMs) have shown impressive performance on a variety of benchmark tasks. This raises questions about the human-likeness of LLM-derived information.
arXiv Detail & Related papers (2024-06-13T11:33:30Z)
Do Large Language Models Learn Human-Like Strategic Preferences? [0.0]
LLMs learn to make human-like preference judgements in strategic scenarios. Solar and Mistral are shown to exhibit stable value-based preference consistent with humans.
arXiv Detail & Related papers (2024-04-11T19:13:24Z)
On Diversified Preferences of Large Language Model Alignment [51.26149027399505]
This paper presents the first quantitative analysis of the experimental scaling law for reward models with varying sizes. Our analysis reveals that the impact of diversified human preferences depends on both model size and data size. Larger models with sufficient capacity mitigate the negative effects of diverse preferences, while smaller models struggle to accommodate them.
arXiv Detail & Related papers (2023-12-12T16:17:15Z)
Aligning Large Language Models with Human: A Survey [53.6014921995006]
Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect information. This survey presents a comprehensive overview of these alignment technologies, including the following aspects.
arXiv Detail & Related papers (2023-07-24T17:44:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.