Preference Learning with Response Time
- URL: http://arxiv.org/abs/2505.22820v1
- Date: Wed, 28 May 2025 19:55:54 GMT
- Title: Preference Learning with Response Time
- Authors: Ayush Sawarni, Sahasrajit Sarmasarkar, Vasilis Syrgkanis,
- Abstract summary: We propose novel methodologies to incorporate response time information alongside binary choice data.<n>We develop Neyman-orthogonal loss functions that achieve oracle convergence rates for reward model learning.<n>Our experiments validate our theoretical findings in the context of preference learning over images.
- Score: 18.659347526840822
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the integration of response time data into human preference learning frameworks for more effective reward model elicitation. While binary preference data has become fundamental in fine-tuning foundation models, generative AI systems, and other large-scale models, the valuable temporal information inherent in user decision-making remains largely unexploited. We propose novel methodologies to incorporate response time information alongside binary choice data, leveraging the Evidence Accumulation Drift Diffusion (EZ) model, under which response time is informative of the preference strength. We develop Neyman-orthogonal loss functions that achieve oracle convergence rates for reward model learning, matching the theoretical optimal rates that would be attained if the expected response times for each query were known a priori. Our theoretical analysis demonstrates that for linear reward functions, conventional preference learning suffers from error rates that scale exponentially with reward magnitude. In contrast, our response time-augmented approach reduces this to polynomial scaling, representing a significant improvement in sample efficiency. We extend these guarantees to non-parametric reward function spaces, establishing convergence properties for more complex, realistic reward models. Our extensive experiments validate our theoretical findings in the context of preference learning over images.
Related papers
- ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition [21.747318210534896]
Amortized Active Learning and Inference Engine (ALINE) is a unified framework for amortized Bayesian inference and active data acquisition.<n>ALINE delivers both instant and accurate inference along with efficient selection of informative points.
arXiv Detail & Related papers (2025-06-08T19:15:34Z) - Disentangling Length Bias In Preference Learning Via Response-Conditioned Modeling [87.17041933863041]
Reinforcement Learning from Human Feedback (RLHF) has achieved considerable success in aligning large language models (LLMs)<n>We introduce a $textbfR$esponse-$textbfc$onditioned $textbfB$radley-$textbfT$erry (Rc-BT) model that enhances the model's capability in length bias mitigating and length instruction following.<n>We also propose the Rc-RM and Rc-DPO algorithm to leverage the Rc-BT model for reward modeling and direct policy optimization
arXiv Detail & Related papers (2025-02-02T14:50:25Z) - TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting [59.702504386429126]
TimeRAF is a Retrieval-Augmented Forecasting model that enhance zero-shot time series forecasting through retrieval-augmented techniques.<n>TimeRAF employs an end-to-end learnable retriever to extract valuable information from the knowledge base.
arXiv Detail & Related papers (2024-12-30T09:06:47Z) - Enhancing Preference-based Linear Bandits via Human Response Time [25.92686846689662]
Interactive preference learning systems infer human preferences by presenting queries as pairs of options and collecting binary choices.<n>We propose a method that combines choices and response times to estimate human utility functions.<n>We incorporate this estimator into preference-based linear bandits for fixed-budget best-arm identification.
arXiv Detail & Related papers (2024-09-09T17:02:47Z) - Asymptotically Optimal Regret for Black-Box Predict-then-Optimize [7.412445894287709]
We study new black-box predict-then-optimize problems that lack special structure and where we only observe the reward from the action taken.
We present a novel loss function, which we call Empirical Soft Regret (ESR), designed to significantly improve reward when used in training.
We also show our approach significantly outperforms state-of-the-art algorithms on real-world decision-making problems in news recommendation and personalized healthcare.
arXiv Detail & Related papers (2024-06-12T04:46:23Z) - Online Iterative Reinforcement Learning from Human Feedback with General Preference Model [20.81421550138371]
We investigate Reinforcement Learning from Human Feedback (RLHF) in the context of a general preference oracle.
We consider a standard mathematical formulation, the reverse-KL regularized minimax game between two LLMs for RLHF under general preference oracle.
We show that this framework is strictly more general than the reward-based one, and propose sample-efficient algorithms for both the offline learning from a pre-collected preference dataset and online learning.
arXiv Detail & Related papers (2024-02-11T21:44:21Z) - Secrets of RLHF in Large Language Models Part II: Reward Modeling [134.97964938009588]
We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset.
We also introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses.
arXiv Detail & Related papers (2024-01-11T17:56:59Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Towards Flexible Time-to-event Modeling: Optimizing Neural Networks via
Rank Regression [17.684526928033065]
We introduce the Deep AFT Rank-regression model for Time-to-event prediction (DART)
This model uses an objective function based on Gehan's rank statistic, which is efficient and reliable for representation learning.
The proposed method is a semiparametric approach to AFT modeling that does not impose any distributional assumptions on the survival time distribution.
arXiv Detail & Related papers (2023-07-16T13:58:28Z) - OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive
Learning [67.07363529640784]
We propose OpenSTL to categorize prevalent approaches into recurrent-based and recurrent-free models.
We conduct standard evaluations on datasets across various domains, including synthetic moving object trajectory, human motion, driving scenes, traffic flow and forecasting weather.
We find that recurrent-free models achieve a good balance between efficiency and performance than recurrent models.
arXiv Detail & Related papers (2023-06-20T03:02:14Z) - FAStEN: An Efficient Adaptive Method for Feature Selection and Estimation in High-Dimensional Functional Regressions [7.674715791336311]
We propose a new, flexible and ultra-efficient approach to perform feature selection in a sparse function-on-function regression problem.
We show how to extend it to the scalar-on-function framework.
We present an application to brain fMRI data from the AOMIC PIOP1 study.
arXiv Detail & Related papers (2023-03-26T19:41:17Z) - Leveraging the structure of dynamical systems for data-driven modeling [111.45324708884813]
We consider the impact of the training set and its structure on the quality of the long-term prediction.
We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models.
arXiv Detail & Related papers (2021-12-15T20:09:20Z) - Spatio-Temporal Functional Neural Networks [11.73856529960872]
We propose two novel extensions of the Neural Functional Network (FNN), a temporal regression model whose effectiveness has been proven by many researchers.
The proposed models are then deployed to solve a practical and challenging precipitation prediction problem in the meteorology field.
arXiv Detail & Related papers (2020-09-11T21:32:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.