Related papers: Improving Value-based Process Verifier via Low-Cost Variance Reduction

Improving Value-based Process Verifier via Low-Cost Variance Reduction

URL: http://arxiv.org/abs/2508.10539v1
Date: Thu, 14 Aug 2025 11:22:29 GMT
Title: Improving Value-based Process Verifier via Low-Cost Variance Reduction
Authors: Zetian Sun, Dongfang Li, Baotian Hu, Min Zhang,
Abstract summary: Large language models (LLMs) have achieved remarkable success in a wide range of tasks.<n>However, their reasoning capabilities, particularly in complex domains like mathematics, remain a significant challenge.<n>Value-based process verifiers, which estimate the probability of a partial reasoning chain leading to a correct solution, are a promising approach for improving reasoning.
Score: 24.609940184050043
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Large language models (LLMs) have achieved remarkable success in a wide range of tasks. However, their reasoning capabilities, particularly in complex domains like mathematics, remain a significant challenge. Value-based process verifiers, which estimate the probability of a partial reasoning chain leading to a correct solution, are a promising approach for improving reasoning. Nevertheless, their effectiveness is often hindered by estimation error in their training annotations, a consequence of the limited number of Monte Carlo (MC) samples feasible due to the high cost of LLM inference. In this paper, we identify that the estimation error primarily arises from high variance rather than bias, and the MC estimator is a Minimum Variance Unbiased Estimator (MVUE). To address the problem, we propose the \textsc{Com}pound \textsc{M}onte \textsc{C}arlo \textsc{S}ampling (ComMCS) method, which constructs an unbiased estimator by linearly combining the MC estimators from the current and subsequent steps. Theoretically, we show that our method leads to a predictable reduction in variance, while maintaining an unbiased estimation without additional LLM inference cost. We also perform empirical experiments on the MATH-500 and GSM8K benchmarks to demonstrate the effectiveness of our method. Notably, ComMCS outperforms regression-based optimization method by 2.8 points, the non-variance-reduced baseline by 2.2 points on MATH-500 on Best-of-32 sampling experiment.

Related papers

Evaluating LLMs When They Do Not Know the Answer: Statistical Evaluation of Mathematical Reasoning via Comparative Signals [18.612081365101464]
We develop a framework that combines standard labeled outcomes with pairwise comparison signals obtained by having models judge auxiliary reasoning chains.<n>Across simulations, our one-step estimator substantially improves ranking accuracy with gains increasing as model output noise grows.<n>Experiments on GPQA Diamond, AIME 2025 and GSM8K further demonstrate more precise performance estimation and more reliable model rankings.
arXiv Detail & Related papers (2026-02-03T03:40:01Z)
Efficient Inference for Noisy LLM-as-a-Judge Evaluation [8.2511120576505]
Large language models (LLMs) are increasingly used as automatic evaluators of generative AI outputs.<n>In practice, LLM judges are imperfect predictions for the underlying truth and can exhibit systematic, non-random errors.
arXiv Detail & Related papers (2026-01-08T22:46:26Z)
Double Machine Learning for Conditional Moment Restrictions: IV Regression, Proximal Causal Learning and Beyond [16.842233444365764]
Conditional moment restrictions (CMRs) are a key problem considered in statistics, causal inference, and econometrics.<n>Most CMR estimators use a two-stage approach, where the first-stage estimation is directly plugged into the second stage to estimate the function of interest.<n>We propose DML-CMR, a two-stage CMR estimator that provides an unbiased estimate with fast convergence rate guarantees.
arXiv Detail & Related papers (2025-06-17T20:00:34Z)
Know What You Don't Know: Uncertainty Calibration of Process Reward Models [8.958124143194512]
Even state-of-the-art PRMs can be poorly calibrated and often overestimate success probabilities.<n>We present a calibration approach, performed via quantile regression, that PRM outputs to better align with true success probabilities.
arXiv Detail & Related papers (2025-06-11T02:39:26Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.<n>The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.<n>The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
The Lessons of Developing Process Reward Models in Mathematical Reasoning [62.165534879284735]
Process Reward Models (PRMs) aim to identify and mitigate intermediate errors in the reasoning processes.<n>We develop a consensus filtering mechanism that effectively integrates Monte Carlo (MC) estimation with Large Language Models (LLMs)<n>We release a new state-of-the-art PRM that outperforms existing open-source alternatives.
arXiv Detail & Related papers (2025-01-13T13:10:16Z)
A Tale of Sampling and Estimation in Discounted Reinforcement Learning [50.43256303670011]
We present a minimax lower bound on the discounted mean estimation problem. We show that estimating the mean by directly sampling from the discounted kernel of the Markov process brings compelling statistical properties.
arXiv Detail & Related papers (2023-04-11T09:13:17Z)
Multi-Fidelity Covariance Estimation in the Log-Euclidean Geometry [0.0]
We introduce a multi-fidelity estimator of covariance matrices that employs the log-Euclidean geometry of the symmetric positive-definite manifold. We develop an optimal sample allocation scheme that minimizes the mean-squared error of the estimator given a fixed budget. Evaluations of our approach using data from physical applications demonstrate more accurate metric learning and speedups of more than one order of magnitude compared to benchmarks.
arXiv Detail & Related papers (2023-01-31T16:33:46Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)
Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z)
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
Efficient Debiased Evidence Estimation by Multilevel Monte Carlo Sampling [0.0]
We propose a new optimization algorithm for Bayesian inference based multilevel Monte Carlo (MLMC) methods. Our numerical results confirm considerable computational savings compared to the conventional estimators.
arXiv Detail & Related papers (2020-01-14T09:14:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.