Related papers: Bellman Calibration for V-Learning in Offline Reinforcement Learning

Bellman Calibration for V-Learning in Offline Reinforcement Learning

URL: http://arxiv.org/abs/2512.23694v1
Date: Mon, 29 Dec 2025 18:52:18 GMT
Title: Bellman Calibration for V-Learning in Offline Reinforcement Learning
Authors: Lars van der Laan, Nathan Kallus,
Abstract summary: We introduce Iterated Bellman, a simple, model-agnostic, post-hoc procedure for calibrating off-policy value predictions.<n>We adapt classical histogram and isotonic calibration to the dynamic, counterfactual setting.<n>This yields a one-dimensional fitted value scheme that can be applied to any value estimator.
Score: 40.322273308230606
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Iterated Bellman Calibration, a simple, model-agnostic, post-hoc procedure for calibrating off-policy value predictions in infinite-horizon Markov decision processes. Bellman calibration requires that states with similar predicted long-term returns exhibit one-step returns consistent with the Bellman equation under the target policy. We adapt classical histogram and isotonic calibration to the dynamic, counterfactual setting by repeatedly regressing fitted Bellman targets onto a model's predictions, using a doubly robust pseudo-outcome to handle off-policy data. This yields a one-dimensional fitted value iteration scheme that can be applied to any value estimator. Our analysis provides finite-sample guarantees for both calibration and prediction under weak assumptions, and critically, without requiring Bellman completeness or realizability.

Related papers

Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting [40.322273308230606]
We show the need for this assumption stems from a fundamental norm mismatch.<n>We propose a simple fix: reweight each regression step using an estimate of the stationary density ratio.<n>This enables strong evaluation guarantees in the absence of realizability or Bellman completeness.
arXiv Detail & Related papers (2025-12-29T19:04:40Z)
Approximating Full Conformal Prediction for Neural Network Regression with Gauss-Newton Influence [8.952347049759094]
We construct prediction intervals for neural network regressors post-hoc without held-out data.<n>We train just once and locally perturb model parameters using Gauss-Newton influence.
arXiv Detail & Related papers (2025-07-27T13:34:32Z)
When Can We Reuse a Calibration Set for Multiple Conformal Predictions? [0.0]
We show how e-conformal prediction, in conjunction with Hoeffding's inequality, can enable the repeated use of a single calibration set.<n>We train a deep neural network and utilise a calibration set to estimate a Hoeffding correction.<n>This correction allows us to apply a modified Markov's inequality, leading to the construction of prediction sets with quantifiable confidence.
arXiv Detail & Related papers (2025-06-24T14:57:25Z)
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
HopCast: Calibration of Autoregressive Dynamics Models [0.0]
This work introduces an alternative Predictor-Corrector approach named hop that uses Modern Hopfield Networks (MHN) to learn the errors of a deterministic Predictor.<n>The Corrector predicts a set of errors for the Predictor's output based on a context state at any timestep during autoregression.<n>The calibration and prediction performances are evaluated across a set of dynamical systems.
arXiv Detail & Related papers (2025-01-27T23:59:23Z)
Calibration by Distribution Matching: Trainable Kernel Calibration Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression. These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization. We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z)
Sharp Calibrated Gaussian Processes [58.94710279601622]
State-of-the-art approaches for designing calibrated models rely on inflating the Gaussian process posterior variance. We present a calibration approach that generates predictive quantiles using a computation inspired by the vanilla Gaussian process posterior variance. Our approach is shown to yield a calibrated model under reasonable assumptions.
arXiv Detail & Related papers (2023-02-23T12:17:36Z)
Modular Conformal Calibration [80.33410096908872]
We introduce a versatile class of algorithms for recalibration in regression. This framework allows one to transform any regression model into a calibrated probabilistic model. We conduct an empirical study of MCC on 17 regression datasets.
arXiv Detail & Related papers (2022-06-23T03:25:23Z)
Robust Losses for Learning Value Functions [26.515147684526124]
Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. We build on recent insights reformulating squared Bellman errors as a saddlepoint optimization problem. We derive sound gradient-based approaches to minimize these losses in both the online off-policy prediction and control settings.
arXiv Detail & Related papers (2022-05-17T16:10:05Z)
Individual Calibration with Randomized Forecasting [116.2086707626651]
We show that calibration for individual samples is possible in the regression setup if the predictions are randomized. We design a training objective to enforce individual calibration and use it to train randomized regression functions.
arXiv Detail & Related papers (2020-06-18T05:53:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.