Related papers: Evaluating Reward Model Generalization via Pairwise Maximum Discrepancy Competitions

Evaluating Reward Model Generalization via Pairwise Maximum Discrepancy Competitions

URL: http://arxiv.org/abs/2601.16987v1
Date: Mon, 05 Jan 2026 15:14:21 GMT
Title: Evaluating Reward Model Generalization via Pairwise Maximum Discrepancy Competitions
Authors: Shunyang Luo, Peibei Cao, Zhihui Zhu, Kehua Feng, Zhihua Wang, Keyan Ding,
Abstract summary: Pairwise Maximum Discrepancy Competition (PMDC) is a dynamic and annotation-efficient framework for evaluating RM generalization.<n>PMDC actively selects prompt--response pairs that maximize disagreement between two RMs.<n>We apply PMDC to re-evaluate 10 representative RMs and observe substantial rank reshuffling compared with conventional benchmarks.
Score: 24.01200309422524
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reward models (RMs) are central to aligning large language models, yet their practical effectiveness hinges on generalization to unseen prompts and shifting distributions. Most existing RM evaluations rely on static, pre-annotated preference datasets, which provide limited coverage and often fail to faithfully assess generalization in open-world settings. We introduce Pairwise Maximum Discrepancy Competition (PMDC), a dynamic and annotation-efficient framework for evaluating RM generalization using a large, unlabeled, open-domain prompt pool. PMDC actively selects prompt--response pairs that maximize disagreement between two RMs, yielding a compact set of highly contentious test cases. These cases are adjudicated by an oracle, and the resulting outcomes are aggregated via a Bradley--Terry model to produce a global ranking and pairwise win-rate landscape of RMs. We apply PMDC to re-evaluate 10 representative RMs and observe substantial rank reshuffling compared with conventional benchmarks. Qualitative analyses further uncover systematic generalization failures, providing valuable insights for improving reward modeling.

Related papers

SoliReward: Mitigating Susceptibility to Reward Hacking and Annotation Noise in Video Generation Reward Models [53.19726629537694]
Post-training alignment of video generation models with human preferences is a critical goal.<n>Current data collection paradigms, reliant on in-prompt pairwise annotations, suffer from labeling noise.<n>We propose SoliReward, a systematic framework for video RM training.
arXiv Detail & Related papers (2025-12-17T14:28:23Z)
Dynamic and Generalizable Process Reward Modeling [74.36829922727026]
We propose Dynamic and Generalizable Process Reward Modeling (DG-PRM), which features a reward tree to capture and store fine-grained, multi-dimensional reward criteria.<n> Experimental results show that DG-PRM achieves stunning performance on prevailing benchmarks, significantly boosting model performance across tasks with dense rewards.
arXiv Detail & Related papers (2025-07-23T18:17:22Z)
RAG-Zeval: Towards Robust and Interpretable Evaluation on RAG Responses through End-to-End Rule-Guided Reasoning [64.46921169261852]
RAG-Zeval is a novel end-to-end framework that formulates faithfulness and correctness evaluation as a rule-guided reasoning task.<n>Our approach trains evaluators with reinforcement learning, facilitating compact models to generate comprehensive and sound assessments.<n>Experiments demonstrate RAG-Zeval's superior performance, achieving the strongest correlation with human judgments.
arXiv Detail & Related papers (2025-05-28T14:55:33Z)
Energy-Based Reward Models for Robust Language Model Alignment [9.843359827321194]
We introduce Energy-Based Reward Model (EBRM), a lightweight post-hoc refinement framework for Reward Models (RMs)<n>EBRM models the reward distribution explicitly, capturing uncertainty in human preferences and mitigating the impact of noisy or misaligned annotations.<n> Empirical evaluations demonstrate significant improvements in robustness and generalization, achieving up to a 5.97% improvement in safety-critical alignment tasks.
arXiv Detail & Related papers (2025-04-17T17:47:15Z)
Reward Models Identify Consistency, Not Causality [54.987590763737145]
State-of-the-art reward models prioritize structural consistency over causal correctness.<n>Removing the problem statement has minimal impact on reward scores.<n> altering numerical values or disrupting the reasoning flow significantly affects RM outputs.
arXiv Detail & Related papers (2025-02-20T14:57:14Z)
The Lessons of Developing Process Reward Models in Mathematical Reasoning [62.165534879284735]
Process Reward Models (PRMs) aim to identify and mitigate intermediate errors in the reasoning processes.<n>We develop a consensus filtering mechanism that effectively integrates Monte Carlo (MC) estimation with Large Language Models (LLMs)<n>We release a new state-of-the-art PRM that outperforms existing open-source alternatives.
arXiv Detail & Related papers (2025-01-13T13:10:16Z)
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment [44.84304822376291]
Reward models (RMs) guide the alignment of large language models (LLMs)<n>We propose RMB, a comprehensive RM benchmark that covers over 49 real-world scenarios.<n>Based on our benchmark, we conduct extensive analysis on the state-of-the-art RMs.
arXiv Detail & Related papers (2024-10-13T16:06:54Z)
Confronting Reward Model Overoptimization with Constrained RLHF [114.71591361764547]
We show that correlation between component RMs has a significant effect on the locations of these points. Our method addresses the problem of weighting component RMs by learning dynamic weights, naturally expressed by Lagrange multipliers.
arXiv Detail & Related papers (2023-10-06T16:59:17Z)
Dual Reweighting Domain Generalization for Face Presentation Attack Detection [40.63170532438904]
Face anti-spoofing approaches based on domain generalization (DG) have drawn growing attention due to their robustness for unseen scenarios. Previous methods treat each sample from multiple domains indiscriminately during the training process. We propose a novel Dual Reweighting Domain Generalization framework which iteratively reweights the relative importance between samples to further improve the generalization.
arXiv Detail & Related papers (2021-06-30T15:24:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.