Contextual Online Uncertainty-Aware Preference Learning for Human Feedback
- URL: http://arxiv.org/abs/2504.19342v2
- Date: Tue, 29 Apr 2025 19:33:10 GMT
- Title: Contextual Online Uncertainty-Aware Preference Learning for Human Feedback
- Authors: Nan Lu, Ethan X. Fang, Junwei Lu,
- Abstract summary: Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm in artificial intelligence.<n>We propose a novel statistical framework to simultaneously conduct the online decision-making and statistical inference on the optimal model.<n>We apply the proposed framework to analyze the human preference data for ranking large language models on the Massive Multitask Language Understanding dataset.
- Score: 13.478503755314344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm in artificial intelligence to align large models with human preferences. In this paper, we propose a novel statistical framework to simultaneously conduct the online decision-making and statistical inference on the optimal model using human preference data based on dynamic contextual information. Our approach introduces an efficient decision strategy that achieves both the optimal regret bound and the asymptotic distribution of the estimators. A key challenge in RLHF is handling the dependent online human preference outcomes with dynamic contexts. To address this, in the methodological aspect, we propose a two-stage algorithm starting with $\epsilon$-greedy followed by exploitations; in the theoretical aspect, we tailor anti-concentration inequalities and matrix martingale concentration techniques to derive the uniform estimation rate and asymptotic normality of the estimators using dependent samples from both stages. Extensive simulation results demonstrate that our method outperforms state-of-the-art strategies. We apply the proposed framework to analyze the human preference data for ranking large language models on the Massive Multitask Language Understanding dataset, yielding insightful results on the performance of different large language models for medical anatomy knowledge.
Related papers
- Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning [3.30671592417223]
Reinforcement learning from human feedback (RLHF) has emerged as a key technique for aligning the output of large language models with human preferences.<n>Most existing RLHF algorithms use the Bradley-Terry model, which relies on assumptions about human preferences that may not reflect the complexity and variability of real-world judgments.<n>We propose a robust algorithm to enhance the performance of existing approaches under such reward model misspecifications.
arXiv Detail & Related papers (2025-04-03T16:16:35Z) - Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
Existing reweighting strategies primarily focus on group-level data importance.<n>We introduce novel algorithms for dynamic, instance-level data reweighting.<n>Our framework allows us to devise reweighting strategies deprioritizing redundant or uninformative data.
arXiv Detail & Related papers (2025-02-10T17:57:15Z) - A Foundational Brain Dynamics Model via Stochastic Optimal Control [15.8358479596609]
We introduce a foundational model for brain dynamics that utilizes optimal control (SOC) and amortized inference.<n>Our method features a continuous-discrete state space model (SSM) that can robustly handle the intricate and noisy nature of fMRI signals.<n>Our model attains state-of-the-art results across a variety of downstream tasks, including demographic prediction, trait analysis, disease diagnosis, and prognosis.
arXiv Detail & Related papers (2025-02-07T12:57:26Z) - Reviving The Classics: Active Reward Modeling in Large Language Model Alignment [7.041595238178957]
Building neural reward models from human preferences is a pivotal component in reinforcement learning.<n>Given the scarcity and high cost of human annotation, how to select the most informative pairs to annotate is an essential yet challenging open problem.<n>We propose the Fisher information-based selection strategies, adapt theories from the classical experimental design literature, and apply them to the final linear layer of the deep neural network-based reward modeling tasks.
arXiv Detail & Related papers (2025-02-04T18:47:11Z) - Feasible Learning [78.6167929413604]
We introduce Feasible Learning (FL), a sample-centric learning paradigm where models are trained by solving a feasibility problem that bounds the loss for each training sample.<n>Our empirical analysis, spanning image classification, age regression, and preference optimization in large language models, demonstrates that models trained via FL can learn from data while displaying improved tail behavior compared to ERM, with only a marginal impact on average performance.
arXiv Detail & Related papers (2025-01-24T20:39:38Z) - Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF [82.7679132059169]
Reinforcement learning from human feedback has emerged as a central tool for language model alignment.
We propose a new algorithm for online exploration in RLHF, Exploratory Preference Optimization (XPO)
XPO enjoys the strongest known provable guarantees and promising empirical performance.
arXiv Detail & Related papers (2024-05-31T17:39:06Z) - A Probabilistic Approach for Model Alignment with Human Comparisons [7.6656660956453635]
We develop a theoretical framework for analyzing the conditions under which human comparisons can enhance the traditional supervised learning process.<n>We propose a two-stage "Supervised Learning+Learning from Human Feedback" (SL+LHF) framework that connects machine learning with human feedback.
arXiv Detail & Related papers (2024-03-16T02:19:21Z) - MaxMin-RLHF: Alignment with Diverse Human Preferences [101.57443597426374]
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.<n>We learn a mixture of preference distributions via an expectation-maximization algorithm to better represent diverse human preferences.<n>Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms.
arXiv Detail & Related papers (2024-02-14T03:56:27Z) - Secrets of RLHF in Large Language Models Part II: Reward Modeling [134.97964938009588]
We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset.
We also introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses.
arXiv Detail & Related papers (2024-01-11T17:56:59Z) - Learnability of Competitive Threshold Models [11.005966612053262]
We study the learnability of the competitive threshold model from a theoretical perspective.
We demonstrate how competitive threshold models can be seamlessly simulated by artificial neural networks.
arXiv Detail & Related papers (2022-05-08T01:11:51Z) - Double Robust Representation Learning for Counterfactual Prediction [68.78210173955001]
We propose a novel scalable method to learn double-robust representations for counterfactual predictions.
We make robust and efficient counterfactual predictions for both individual and average treatment effects.
The algorithm shows competitive performance with the state-of-the-art on real world and synthetic data.
arXiv Detail & Related papers (2020-10-15T16:39:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.