Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
- URL: http://arxiv.org/abs/2410.05584v2
- Date: Tue, 15 Oct 2024 04:50:47 GMT
- Title: Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
- Authors: Xueru Wen, Jie Lou, Yaojie Lu, Hongyu Lin, Xing Yu, Xinyu Lu, Ben He, Xianpei Han, Debing Zhang, Le Sun,
- Abstract summary: We investigate how differences in RM accuracy translate into gaps in optimized policy performance.
We find that the way of measuring accuracy significantly impacts its ability to predict the final policy performance.
- Score: 46.396681032860414
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Reward Models (RMs) are crucial for aligning language models with human preferences. Currently, the evaluation of RMs depends on measuring accuracy against a validation set of manually annotated preference data. Although this method is straightforward and widely adopted, the relationship between RM accuracy and downstream policy performance remains under-explored. In this work, we conduct experiments in a synthetic setting to investigate how differences in RM measured by accuracy translate into gaps in optimized policy performance. Our findings reveal that while there is a weak positive correlation between accuracy and downstream performance, policies optimized towards RMs with similar accuracy can exhibit quite different performance. Moreover, we discover that the way of measuring accuracy significantly impacts its ability to predict the final policy performance. Through the lens of Regressional Goodhart's effect, we identify the existence of exogenous variables impacting the relationship between RM quality measured by accuracy and policy model capability. This underscores the inadequacy of relying solely on accuracy to reflect their impact on policy optimization.
Related papers
- RMB: Comprehensively Benchmarking Reward Models in LLM Alignment [44.84304822376291]
Reward models (RMs) guide the alignment of large language models (LLMs)
We propose RMB, a comprehensive RM benchmark that covers over 49 real-world scenarios.
Based on our benchmark, we conduct extensive analysis on the state-of-the-art RMs.
arXiv Detail & Related papers (2024-10-13T16:06:54Z) - Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown [20.753374166695494]
We introduce the Uncertainty-aware Reward Model (URM) and its ensemble variant, URME.
URM employs a probabilistic value head to capture aleatoric uncertainty by modeling the distribution of disentangled human preference attributes.
URME further quantifies uncertainty by examining discrepancies among individual URMs within the ensemble, enabling identification of unreliable evaluations.
arXiv Detail & Related papers (2024-10-01T16:29:59Z) - SEAL: Systematic Error Analysis for Value ALignment [4.2185937778110825]
Reinforcement Learning from Human Feedback aims to align language models with human values.
Despite its importance, the internal mechanisms of RLHF remain poorly understood.
This paper introduces new metrics to evaluate the effectiveness of modeling and aligning human values.
arXiv Detail & Related papers (2024-08-16T18:48:30Z) - Are We Really Achieving Better Beyond-Accuracy Performance in Next Basket Recommendation? [57.91114305844153]
Next basket recommendation (NBR) is a special type of sequential recommendation that is increasingly receiving attention.
Recent studies into NBR have found a substantial performance difference between recommending repeat items and explore items.
We propose a plug-and-play two-step repetition-exploration framework that treats repeat items and explores items separately.
arXiv Detail & Related papers (2024-05-02T09:59:35Z) - Machine Learning Simulates Agent-Based Model Towards Policy [0.0]
We use a random forest machine learning algorithm to emulate an agent-based model (ABM) and evaluate competing policies across 46 Metropolitan Regions (MRs) in Brazil.
As a result, we obtain the optimal (and non-optimal) performance of each region over the policies.
Results suggest that MRs already have embedded structures that favor optimal or non-optimal results, but they also illustrate which policy is more beneficial to each place.
arXiv Detail & Related papers (2022-03-04T21:19:11Z) - Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature.
We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance.
By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Understanding the Effects of Adversarial Personalized Ranking
Optimization Method on Recommendation Quality [6.197934754799158]
We model the learning characteristics of the Bayesian Personalized Ranking (BPR) and APR optimization frameworks.
We show that APR amplifies the popularity bias more than BPR due to an unbalanced number of received positive updates from short-head items.
arXiv Detail & Related papers (2021-07-29T10:22:20Z) - Stochastic Optimization of Areas Under Precision-Recall Curves with
Provable Convergence [66.83161885378192]
Area under ROC (AUROC) and precision-recall curves (AUPRC) are common metrics for evaluating classification performance for imbalanced problems.
We propose a technical method to optimize AUPRC for deep learning.
arXiv Detail & Related papers (2021-04-18T06:22:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.