Related papers: Learning Correlated Reward Models: Statistical Barriers and Opportunities

Learning Correlated Reward Models: Statistical Barriers and Opportunities

URL: http://arxiv.org/abs/2510.15839v1
Date: Fri, 17 Oct 2025 17:31:17 GMT
Title: Learning Correlated Reward Models: Statistical Barriers and Opportunities
Authors: Yeshwanth Cherapanamjeri, Constantinos Daskalakis, Gabriele Farina, Sobhan Mohammadpour,
Abstract summary: This paper investigates the statistical and computational challenges of learning a RUM that avoids the IIA assumption.<n>We devise a statistically and computationally efficient estimator with near-optimal performance.<n>Results highlight the benefits of higher-order preference data in learning correlated utilities, allowing for more fine-grained modeling of human preferences.
Score: 39.27536879408937
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Random Utility Models (RUMs) are a classical framework for modeling user preferences and play a key role in reward modeling for Reinforcement Learning from Human Feedback (RLHF). However, a crucial shortcoming of many of these techniques is the Independence of Irrelevant Alternatives (IIA) assumption, which collapses \emph{all} human preferences to a universal underlying utility function, yielding a coarse approximation of the range of human preferences. On the other hand, statistical and computational guarantees for models avoiding this assumption are scarce. In this paper, we investigate the statistical and computational challenges of learning a \emph{correlated} probit model, a fundamental RUM that avoids the IIA assumption. First, we establish that the classical data collection paradigm of pairwise preference data is \emph{fundamentally insufficient} to learn correlational information, explaining the lack of statistical and computational guarantees in this setting. Next, we demonstrate that \emph{best-of-three} preference data provably overcomes these shortcomings, and devise a statistically and computationally efficient estimator with near-optimal performance. These results highlight the benefits of higher-order preference data in learning correlated utilities, allowing for more fine-grained modeling of human preferences. Finally, we validate these theoretical guarantees on several real-world datasets, demonstrating improved personalization of human preferences.

Related papers

Uncertainty Quantification for Large Language Model Reward Learning under Heterogeneous Human Feedback [8.538830579425147]
We study estimation and statistical reward models used in aligning large language (LLMs)<n>A key component of LLM alignment is reinforcement learning from human feedback.
arXiv Detail & Related papers (2025-12-02T20:22:25Z)
SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z)
Preference Learning for AI Alignment: a Causal Perspective [55.2480439325792]
We frame this problem in a causal paradigm, providing the rich toolbox of causality to identify persistent challenges.<n>Inheriting from the literature of causal inference, we identify key assumptions necessary for reliable generalisation.<n>We illustrate failure modes of naive reward models and demonstrate how causally-inspired approaches can improve model robustness.
arXiv Detail & Related papers (2025-06-06T10:45:42Z)
Detecting Prefix Bias in LLM-based Reward Models [4.596249232904721]
We introduce novel methods to detect and evaluate prefix bias in reward models trained on preference datasets.<n>We leverage these metrics to reveal significant biases in preference models across racial and gender dimensions.<n>Our findings highlight the critical need for bias-aware dataset design and evaluation in developing fair and reliable reward models.
arXiv Detail & Related papers (2025-05-13T21:50:03Z)
Ranking and Combining Latent Structured Predictive Scores without Labeled Data [2.5064967708371553]
This paper introduces a novel structured unsupervised ensemble learning model (SUEL) It exploits the dependency between a set of predictors with continuous predictive scores, rank the predictors without labeled data and combine them to an ensembled score with weights. The efficacy of the proposed methods is rigorously assessed through both simulation studies and real-world application of risk genes discovery.
arXiv Detail & Related papers (2024-08-14T20:14:42Z)
Secrets of RLHF in Large Language Models Part II: Reward Modeling [134.97964938009588]
We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset. We also introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses.
arXiv Detail & Related papers (2024-01-11T17:56:59Z)
Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters. EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z)
Empirical Analysis of Model Selection for Heterogeneous Causal Effect Estimation [24.65301562548798]
We study the problem of model selection in causal inference, specifically for conditional average treatment effect (CATE) estimation. We conduct an empirical analysis to benchmark the surrogate model selection metrics introduced in the literature, as well as the novel ones introduced in this work.
arXiv Detail & Related papers (2022-11-03T16:26:06Z)
Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions [59.284907093349425]
Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models. We provide a language for describing how training data influences predictions, through a causal framework. Our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone.
arXiv Detail & Related papers (2022-07-28T17:36:24Z)
A bandit-learning approach to multifidelity approximation [7.960229223744695]
Multifidelity approximation is an important technique in scientific computation and simulation. We introduce a bandit-learning approach for leveraging data of varying fidelities to achieve precise estimates.
arXiv Detail & Related papers (2021-03-29T05:29:35Z)
On Statistical Efficiency in Learning [37.08000833961712]
We address the challenge of model selection to strike a balance between model fitting and model complexity. We propose an online algorithm that sequentially expands the model complexity to enhance selection stability and reduce cost. Experimental studies show that the proposed method has desirable predictive power and significantly less computational cost than some popular methods.
arXiv Detail & Related papers (2020-12-24T16:08:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.