Related papers: ContextPRM: Leveraging Contextual Coherence for multi-domain Test-Time Scaling

ContextPRM: Leveraging Contextual Coherence for multi-domain Test-Time Scaling

URL: http://arxiv.org/abs/2509.24460v1
Date: Mon, 29 Sep 2025 08:40:46 GMT
Title: ContextPRM: Leveraging Contextual Coherence for multi-domain Test-Time Scaling
Authors: Haotian Zhang, Liu Liu, Baosheng Yu, Jiayan Qiu, Likang Xiao, Yanwei Ren, Quan Chen, Xianglong Liu,
Abstract summary: Process reward models (PRMs) have demonstrated significant efficacy in enhancing the mathematical reasoning capabilities of large language models (LLMs) by leveraging test-time scaling (TTS)<n>We shift the learning objective from verifying domain-specific knowledge to modeling domain-agnostic logical flow.<n>Our approach is realized through a novel data annotation and training framework, which enhances the model's generalization capabilities across diverse domains.
Score: 38.779046730647856
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Process reward models (PRMs) have demonstrated significant efficacy in enhancing the mathematical reasoning capabilities of large language models (LLMs) by leveraging test-time scaling (TTS). However, while most PRMs exhibit substantial gains in mathematical domains, the scarcity of domain-specific training data and knowledge-based learning patterns limits their generalization ability when faced with other domains. To address this limitation, we shift the learning objective from verifying domain-specific knowledge to modeling domain-agnostic logical flow. Centering on contextual coherence between chain-of-thought (CoT) steps, our approach is realized through a novel data annotation and training framework, which enhances the model's generalization capabilities across diverse domains. For instance, our resulting model, ContextPRM, achieves a notable 6.5% average accuracy improvement over the majority voting baseline via weighted majority voting across nine non-mathematical domains in MMLU-Pro, including law, history, and philosophy, significantly surpassing the 2.2% improvement from VersaPRM and 0.5% gains from other mathematics-focused PRMs, demonstrating consistent performance across both mathematical and non-mathematical domains.

Related papers

RoME: Domain-Robust Mixture-of-Experts for MILP Solution Prediction across Domains [17.62400981694534]
We introduce RoME, a domain-Robust Mixture-of-Experts framework for predicting MILP solutions across domains.<n>A single RoME model trained on three domains achieves an average improvement of 67.7%, then evaluated on five diverse domains.
arXiv Detail & Related papers (2025-11-04T07:32:27Z)
Towards Text-free Graph Foundation Models: Rethinking Multi-Domain Graph Contrastive Learning [40.56379624114316]
We propose a novel multi-domain pre-training and cross-domain transfer framework, namely MDGCL.<n>In the pre-training stage, we design a contrastive learning strategy to substantially recognize and capture domain differences.<n>In the downstream stage, we introduce a domain attention mechanism to enable fine-grained domain knowledge transfer.
arXiv Detail & Related papers (2025-06-26T03:14:50Z)
Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets [6.001837672951086]
We introduce a novel Process Reward Model (PRM) trained automatically using Monte Carlo Tree Search.<n>We then adapt Generative Flow Networks (GFlowNets) to operate at the reasoning step level.<n> Empirical evaluation shows strong improvements in both accuracy and solution diversity on challenging mathematical benchmarks.
arXiv Detail & Related papers (2025-04-28T16:56:41Z)
DIDS: Domain Impact-aware Data Sampling for Large Language Model Training [61.10643823069603]
We present Domain Impact-aware Data Sampling (DIDS) for large language models.<n>DIDS group training data based on learning effects, where a proxy language model and dimensionality reduction are employed.<n>It achieves 3.4% higher average performance while maintaining comparable training efficiency.
arXiv Detail & Related papers (2025-04-17T13:09:38Z)
VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data [21.460891616139534]
We introduce VersaPRM, a multi-domain PRM trained on synthetic reasoning data generated using our novel data generation and annotation method.<n> VersaPRM achieves consistent performance gains across diverse domains.<n>We further contribute to the community by open-sourcing all data, code and models for VersaPRM.
arXiv Detail & Related papers (2025-02-10T18:03:36Z)
FIXED: Frustratingly Easy Domain Generalization with Mixup [53.782029033068675]
Domain generalization (DG) aims to learn a generalizable model from multiple training domains such that it can perform well on unseen target domains. A popular strategy is to augment training data to benefit generalization through methods such as Mixupcitezhang 2018mixup. We propose a simple yet effective enhancement for Mixup-based DG, namely domain-invariant Feature mIXup (FIX) Our approach significantly outperforms nine state-of-the-art related methods, beating the best performing baseline by 6.5% on average in terms of test accuracy.
arXiv Detail & Related papers (2022-11-07T09:38:34Z)
TAL: Two-stream Adaptive Learning for Generalizable Person Re-identification [115.31432027711202]
We argue that both domain-specific and domain-invariant features are crucial for improving the generalization ability of re-id models. We name two-stream adaptive learning (TAL) to simultaneously model these two kinds of information. Our framework can be applied to both single-source and multi-source domain generalization tasks.
arXiv Detail & Related papers (2021-11-29T01:27:42Z)
f-Domain-Adversarial Learning: Theory and Algorithms [82.97698406515667]
Unsupervised domain adaptation is used in many machine learning applications where, during training, a model has access to unlabeled data in the target domain. We derive a novel generalization bound for domain adaptation that exploits a new measure of discrepancy between distributions based on a variational characterization of f-divergences.
arXiv Detail & Related papers (2021-06-21T18:21:09Z)
Model-Based Domain Generalization [96.84818110323518]
We propose a novel approach for the domain generalization problem called Model-Based Domain Generalization. Our algorithms beat the current state-of-the-art methods on the very-recently-proposed WILDS benchmark by up to 20 percentage points.
arXiv Detail & Related papers (2021-02-23T00:59:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.