Related papers: Towards a Unified View of Preference Learning for Large Language Models: A Survey

Towards a Unified View of Preference Learning for Large Language Models: A Survey

URL: http://arxiv.org/abs/2409.02795v5
Date: Thu, 31 Oct 2024 05:39:06 GMT
Title: Towards a Unified View of Preference Learning for Large Language Models: A Survey
Authors: Bofei Gao, Feifan Song, Yibo Miao, Zefan Cai, Zhe Yang, Liang Chen, Helan Hu, Runxin Xu, Qingxiu Dong, Ce Zheng, Shanghaoran Quan, Wen Xiao, Ge Zhang, Daoguang Zan, Keming Lu, Bowen Yu, Dayiheng Liu, Zeyu Cui, Jian Yang, Lei Sha, Houfeng Wang, Zhifang Sui, Peiyi Wang, Tianyu Liu, Baobao Chang,
Abstract summary: Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. We decompose all the strategies in preference learning into four components: model, data, feedback, and algorithm.
Score: 88.66719962576005
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to understand. The relationships between different methods have been under-explored, limiting the development of the preference alignment. In light of this, we break down the existing popular alignment strategies into different components and provide a unified framework to study the current alignment strategies, thereby establishing connections among them. In this survey, we decompose all the strategies in preference learning into four components: model, data, feedback, and algorithm. This unified view offers an in-depth understanding of existing alignment algorithms and also opens up possibilities to synergize the strengths of different strategies. Furthermore, we present detailed working examples of prevalent existing algorithms to facilitate a comprehensive understanding for the readers. Finally, based on our unified perspective, we explore the challenges and future research directions for aligning large language models with human preferences.

Related papers

Interaction-Grounded Learning for Contextual Markov Decision Processes with Personalized Feedback [59.287761696290865]
We propose a computationally efficient algorithm that achieves a sublinear regret guarantee for contextual episodic Markov Decision Processes (MDPs) with personalized feedback.<n>We demonstrate the effectiveness of our method in learning personalized objectives from multi-turn interactions through experiments on both a synthetic episodic MDP and a real-world user booking dataset.
arXiv Detail & Related papers (2026-02-09T06:29:54Z)
Chunking Strategies for Multimodal AI Systems [0.0]
This survey provides a comprehensive taxonomy and technical analysis of chunking strategies tailored for each modality.<n>We examine classical and modern approaches such as fixed-size token windowing, object-centric visual chunking, silence-based audio segmentation, and scene detection in videos.<n>We explore emerging cross-modal chunking strategies that aim to preserve alignment and semantic consistency across disparate data types.
arXiv Detail & Related papers (2025-11-28T19:48:14Z)
Aligning Multimodal LLM with Human Preference: A Survey [62.89722942008262]
Large language models (LLMs) can handle a wide variety of general tasks with simple prompts, without the need for task-specific training. Multimodal Large Language Models (MLLMs) have demonstrated impressive potential in tackling complex tasks involving visual, auditory, and textual data. However, critical issues related to truthfulness, safety, o1-like reasoning, and alignment with human preference remain insufficiently addressed.
arXiv Detail & Related papers (2025-03-18T17:59:56Z)
Many of Your DPOs are Secretly One: Attempting Unification Through Mutual Information [5.655057078073446]
Post-alignment of large language models (LLMs) is critical in improving their utility, safety, and alignment with human intentions. Direct preference optimisation (DPO) has become one of the most widely used algorithms for achieving this alignment. This paper introduces a unifying framework inspired by mutual information, which proposes a new loss function with flexible priors.
arXiv Detail & Related papers (2025-01-02T21:31:38Z)
EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z)
Fine-tuning Multimodal Large Language Models for Product Bundling [53.01642741096356]
We introduce Bundle-MLLM, a novel framework that fine-tunes large language models (LLMs) through a hybrid item tokenization approach. Specifically, we integrate textual, media, and relational data into a unified tokenization, introducing a soft separation token to distinguish between textual and non-textual tokens. We propose a progressive optimization strategy that fine-tunes LLMs for disentangled objectives: 1) learning bundle patterns and 2) enhancing multimodal semantic understanding specific to product bundling.
arXiv Detail & Related papers (2024-07-16T13:30:14Z)
A Survey on Efficient Inference for Large Language Models [25.572035747669275]
Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. The substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. This paper presents a comprehensive survey of the existing literature on efficient LLM inference.
arXiv Detail & Related papers (2024-04-22T15:53:08Z)
Supervised Fine-Tuning as Inverse Reinforcement Learning [8.044033685073003]
The prevailing approach to aligning Large Language Models (LLMs) typically relies on human or AI feedback. In our work, we question the efficacy of such datasets and explore various scenarios where alignment with expert demonstrations proves more realistic.
arXiv Detail & Related papers (2024-03-18T17:52:57Z)
LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model. This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z)
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains. This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z)
Sample Efficient Preference Alignment in LLMs via Active Exploration [63.84454768573154]
We take advantage of the fact that one can often choose contexts at which to obtain human feedback to most efficiently identify a good policy. We propose an active exploration algorithm to efficiently select the data and provide theoretical proof that it has a worst-case regret bound. Our method outperforms the baselines with limited samples of human preferences on several language models and four real-world datasets.
arXiv Detail & Related papers (2023-12-01T00:54:02Z)
Aligning Large Language Models with Human: A Survey [53.6014921995006]
Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect information. This survey presents a comprehensive overview of these alignment technologies, including the following aspects.
arXiv Detail & Related papers (2023-07-24T17:44:58Z)
Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies [20.15851744895469]
In-context learning (ICL) has emerged as a new approach to various natural language processing tasks. In this paper, we aim to extend this method to question answering tasks that utilize structured knowledge sources.
arXiv Detail & Related papers (2023-05-21T22:44:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.