Bellman Calibration for V-Learning in Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2512.23694v1
- Date: Mon, 29 Dec 2025 18:52:18 GMT
- Title: Bellman Calibration for V-Learning in Offline Reinforcement Learning
- Authors: Lars van der Laan, Nathan Kallus,
- Abstract summary: We introduce Iterated Bellman, a simple, model-agnostic, post-hoc procedure for calibrating off-policy value predictions.<n>We adapt classical histogram and isotonic calibration to the dynamic, counterfactual setting.<n>This yields a one-dimensional fitted value scheme that can be applied to any value estimator.
- Score: 40.322273308230606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Iterated Bellman Calibration, a simple, model-agnostic, post-hoc procedure for calibrating off-policy value predictions in infinite-horizon Markov decision processes. Bellman calibration requires that states with similar predicted long-term returns exhibit one-step returns consistent with the Bellman equation under the target policy. We adapt classical histogram and isotonic calibration to the dynamic, counterfactual setting by repeatedly regressing fitted Bellman targets onto a model's predictions, using a doubly robust pseudo-outcome to handle off-policy data. This yields a one-dimensional fitted value iteration scheme that can be applied to any value estimator. Our analysis provides finite-sample guarantees for both calibration and prediction under weak assumptions, and critically, without requiring Bellman completeness or realizability.
Related papers
- Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting [40.322273308230606]
We show the need for this assumption stems from a fundamental norm mismatch.<n>We propose a simple fix: reweight each regression step using an estimate of the stationary density ratio.<n>This enables strong evaluation guarantees in the absence of realizability or Bellman completeness.
arXiv Detail & Related papers (2025-12-29T19:04:40Z) - Approximating Full Conformal Prediction for Neural Network Regression with Gauss-Newton Influence [8.952347049759094]
We construct prediction intervals for neural network regressors post-hoc without held-out data.<n>We train just once and locally perturb model parameters using Gauss-Newton influence.
arXiv Detail & Related papers (2025-07-27T13:34:32Z) - When Can We Reuse a Calibration Set for Multiple Conformal Predictions? [0.0]
We show how e-conformal prediction, in conjunction with Hoeffding's inequality, can enable the repeated use of a single calibration set.<n>We train a deep neural network and utilise a calibration set to estimate a Hoeffding correction.<n>This correction allows us to apply a modified Markov's inequality, leading to the construction of prediction sets with quantifiable confidence.
arXiv Detail & Related papers (2025-06-24T14:57:25Z) - Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z) - HopCast: Calibration of Autoregressive Dynamics Models [0.0]
This work introduces an alternative Predictor-Corrector approach named hop that uses Modern Hopfield Networks (MHN) to learn the errors of a deterministic Predictor.<n>The Corrector predicts a set of errors for the Predictor's output based on a context state at any timestep during autoregression.<n>The calibration and prediction performances are evaluated across a set of dynamical systems.
arXiv Detail & Related papers (2025-01-27T23:59:23Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Sharp Calibrated Gaussian Processes [58.94710279601622]
State-of-the-art approaches for designing calibrated models rely on inflating the Gaussian process posterior variance.
We present a calibration approach that generates predictive quantiles using a computation inspired by the vanilla Gaussian process posterior variance.
Our approach is shown to yield a calibrated model under reasonable assumptions.
arXiv Detail & Related papers (2023-02-23T12:17:36Z) - Modular Conformal Calibration [80.33410096908872]
We introduce a versatile class of algorithms for recalibration in regression.
This framework allows one to transform any regression model into a calibrated probabilistic model.
We conduct an empirical study of MCC on 17 regression datasets.
arXiv Detail & Related papers (2022-06-23T03:25:23Z) - Robust Losses for Learning Value Functions [26.515147684526124]
Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error.
We build on recent insights reformulating squared Bellman errors as a saddlepoint optimization problem.
We derive sound gradient-based approaches to minimize these losses in both the online off-policy prediction and control settings.
arXiv Detail & Related papers (2022-05-17T16:10:05Z) - Individual Calibration with Randomized Forecasting [116.2086707626651]
We show that calibration for individual samples is possible in the regression setup if the predictions are randomized.
We design a training objective to enforce individual calibration and use it to train randomized regression functions.
arXiv Detail & Related papers (2020-06-18T05:53:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.