Don't Forget Your Reward Values: Language Model Alignment via
Value-based Calibration
- URL: http://arxiv.org/abs/2402.16030v1
- Date: Sun, 25 Feb 2024 08:45:10 GMT
- Title: Don't Forget Your Reward Values: Language Model Alignment via
Value-based Calibration
- Authors: Xin Mao, Feng-Lin Li, Huimin Xu, Wei Zhang, Anh Tuan Luu
- Abstract summary: We propose a novel textbfValue-based textbfCalitextbfBration (VCB) method to better align Large Language Models with human preferences.
Experimental results demonstrate that VCB surpasses existing alignment methods on AI assistant and summarization datasets.
- Score: 26.467379188463028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While Reinforcement Learning from Human Feedback (RLHF) significantly
enhances the generation quality of Large Language Models (LLMs), recent studies
have raised concerns regarding the complexity and instability associated with
the Proximal Policy Optimization (PPO) algorithm, proposing a series of
order-based calibration methods as viable alternatives. This paper delves
further into current order-based methods, examining their inefficiencies in
utilizing reward values and addressing misalignment issues. Building upon these
findings, we propose a novel \textbf{V}alue-based \textbf{C}ali\textbf{B}ration
(VCB) method to better align LLMs with human preferences. Experimental results
demonstrate that VCB surpasses existing alignment methods on AI assistant and
summarization datasets, providing impressive generalizability, robustness, and
stability in diverse settings.
Related papers
- Direct Preference Optimization Using Sparse Feature-Level Constraints [47.15096507230884]
Feature-level constrained Preference Optimization is a novel method designed to simplify the alignment process while ensuring stability.
Our approach enjoys efficiency by using sparse features activated in a well-trained sparse autoencoder and the quality of sequential KL divergence.
arXiv Detail & Related papers (2024-11-12T07:54:13Z) - Progressively Label Enhancement for Large Language Model Alignment [42.01694160556464]
Large Language Models (LLM) alignment aims to prevent models from producing content that misaligns with human expectations.
We propose PLE, a framework that dynamically adjusts the model's training process based on the evolving quality of the generated data.
arXiv Detail & Related papers (2024-08-05T16:21:17Z) - SAIL: Self-Improving Efficient Online Alignment of Large Language Models [56.59644677997827]
Reinforcement Learning from Human Feedback is a key method for aligning large language models with human preferences.
Recent literature has focused on designing online RLHF methods but still lacks a unified conceptual formulation.
Our approach significantly improves alignment performance on open-sourced datasets with minimal computational overhead.
arXiv Detail & Related papers (2024-06-21T18:05:35Z) - Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - LIRE: listwise reward enhancement for preference alignment [27.50204023448716]
We propose a gradient-based reward optimization approach that incorporates the offline rewards of multiple responses into a streamlined listwise framework.
LIRE is straightforward to implement, requiring minimal parameter tuning, and seamlessly aligns with the pairwise paradigm.
Our experiments demonstrate that LIRE consistently outperforms existing methods across several benchmarks on dialogue and summarization tasks.
arXiv Detail & Related papers (2024-05-22T10:21:50Z) - CURATRON: Complete and Robust Preference Data for Rigorous Alignment of Large Language Models [1.6339731044538859]
This paper addresses the challenges of aligning large language models with human values via preference learning.
We propose a novel method for robustly and maliciously manipulated AI pipeline datasets to enhance LLMs' resilience.
arXiv Detail & Related papers (2024-03-05T07:58:12Z) - Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback [70.32795295142648]
Linear alignment is a novel algorithm that aligns language models with human preferences in one single inference step.
Experiments on both general and personalized preference datasets demonstrate that linear alignment significantly enhances the performance and efficiency of LLM alignment.
arXiv Detail & Related papers (2024-01-21T10:46:23Z) - Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment [105.34140537748546]
We propose an improved alignment approach named FIGA. Different from prior methods, we incorporate fine-grained quality signals that are derived by contrasting good and bad responses.
Our approach has made two major contributions. Firstly, we curate a refined alignment dataset that pairs initial responses and the corresponding revised ones.
Secondly, we devise a new loss function can leverage fine-grained quality signals to instruct the learning of LLMs for alignment.
arXiv Detail & Related papers (2023-11-07T15:36:40Z) - PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback [106.63518036538163]
We present a novel unified bilevel optimization-based framework, textsfPARL, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning.
Our framework addressed these concerns by explicitly parameterizing the distribution of the upper alignment objective (reward design) by the lower optimal variable.
Our empirical results substantiate that the proposed textsfPARL can address the alignment concerns in RL by showing significant improvements.
arXiv Detail & Related papers (2023-08-03T18:03:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.