AdaJudge: Adaptive Multi-Perspective Judging for Reward Modeling
- URL: http://arxiv.org/abs/2601.08097v1
- Date: Tue, 13 Jan 2026 00:37:38 GMT
- Title: AdaJudge: Adaptive Multi-Perspective Judging for Reward Modeling
- Authors: Yongliang Miao, Yangyang Liang, Mengnan Du,
- Abstract summary: We propose AdaJudge, a unified framework that jointly adapts representation and aggregation.<n>AdaJudge first refines backbone representations into a discrimination-oriented space via refinement blocks.<n>It then replaces the static readout with an adaptive multi-view pooling module that dynamically routes and combines evidence.
- Score: 23.81351558826977
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reward modeling is essential for aligning large language models with human preferences, yet predominant architectures rely on a static pooling strategy to condense sequences into scalar scores. This paradigm, however, suffers from two key limitations: a static inductive bias that misaligns with task-dependent preference signals, and a representational mismatch, as the backbone is optimized for generation rather than fine-grained discrimination. To address this, we propose AdaJudge, a unified framework that jointly adapts representation and aggregation. AdaJudge first refines backbone representations into a discrimination-oriented space via gated refinement blocks. It then replaces the static readout with an adaptive multi-view pooling module that dynamically routes and combines evidence. Extensive experiments on RM-Bench and JudgeBench show that AdaJudge outperforms strong off-the-shelf reward models and traditional pooling baselines.
Related papers
- SoliReward: Mitigating Susceptibility to Reward Hacking and Annotation Noise in Video Generation Reward Models [53.19726629537694]
Post-training alignment of video generation models with human preferences is a critical goal.<n>Current data collection paradigms, reliant on in-prompt pairwise annotations, suffer from labeling noise.<n>We propose SoliReward, a systematic framework for video RM training.
arXiv Detail & Related papers (2025-12-17T14:28:23Z) - Multimodal Large Language Models with Adaptive Preference Optimization for Sequential Recommendation [60.33386541343322]
We propose a Multimodal Large Language Models framework that integrates Hardness-aware and Noise-regularized preference optimization for Recommendation (HaNoRec)<n>Specifically, HaNoRec dynamically adjusts optimization weights based on both the estimated hardness of each training sample and the policy model's real-time responsiveness.
arXiv Detail & Related papers (2025-11-24T04:10:46Z) - An Integrated Fusion Framework for Ensemble Learning Leveraging Gradient Boosting and Fuzzy Rule-Based Models [59.13182819190547]
Fuzzy rule-based models excel in interpretability and have seen widespread application across diverse fields.<n>They face challenges such as complex design specifications and scalability issues with large datasets.<n>This paper proposes an Integrated Fusion Framework that merges the strengths of both paradigms to enhance model performance and interpretability.
arXiv Detail & Related papers (2025-11-11T10:28:23Z) - MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning [28.478879569025583]
We introduce MiCRo, a two-stage framework that enhances personalized preference learning by leveraging large-scale binary preference datasets.<n>In the first stage, MiCRo introduces context-aware mixture modeling approach to capture diverse human preferences.<n>In the second stage, MiCRo integrates an online routing strategy that dynamically adapts mixture weights based on specific context to resolve ambiguity.
arXiv Detail & Related papers (2025-05-30T17:44:28Z) - Bone Soups: A Seek-and-Soup Model Merging Approach for Controllable Multi-Objective Generation [42.662194131372125]
Bone Soup is a novel model merging approach that first seeks a series of backbone models and then makes the soup (i.e., merge the backbone models)<n>We show that Bone Soup exhibits strong controllability and Pareto optimality in controllable multi-objective generation.
arXiv Detail & Related papers (2025-02-15T11:00:36Z) - CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition [10.045163723630159]
CHASE operates as a sample-adaptive normalization method to mitigate inter-entity distribution discrepancies.<n>Our approach seamlessly adapts to single-entity backbones and boosts their performance in multi-entity scenarios.
arXiv Detail & Related papers (2024-10-09T17:55:43Z) - Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment [46.44464839353993]
We introduce Rewards-in-Context (RiC), which conditions the response of a foundation model on multiple rewards in its prompt context.
RiC only requires supervised fine-tuning of a single foundation model and supports dynamic adjustment for user preferences during inference time.
arXiv Detail & Related papers (2024-02-15T18:58:31Z) - Learning Cross-view Geo-localization Embeddings via Dynamic Weighted
Decorrelation Regularization [52.493240055559916]
Cross-view geo-localization aims to spot images of the same location shot from two platforms, e.g., the drone platform and the satellite platform.
Existing methods usually focus on optimizing the distance between one embedding with others in the feature space.
In this paper, we argue that the low redundancy is also of importance, which motivates the model to mine more diverse patterns.
arXiv Detail & Related papers (2022-11-10T02:13:10Z) - Switchable Representation Learning Framework with Self-compatibility [50.48336074436792]
We propose a Switchable representation learning Framework with Self-Compatibility (SFSC)
SFSC generates a series of compatible sub-models with different capacities through one training process.
SFSC achieves state-of-the-art performance on the evaluated datasets.
arXiv Detail & Related papers (2022-06-16T16:46:32Z) - Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank.
Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.