To bootstrap or to rollout? An optimal and adaptive interpolation
- URL: http://arxiv.org/abs/2411.09731v1
- Date: Thu, 14 Nov 2024 19:00:00 GMT
- Title: To bootstrap or to rollout? An optimal and adaptive interpolation
- Authors: Wenlong Mou, Jian Qian,
- Abstract summary: We introduce a class of Bellman operators that interpolate between bootstrapping and rollout methods.
Our estimator combines the strengths of the bootstrapping-based temporal difference (TD) estimator and the rollout-based Monte Carlo (MC) methods.
- Score: 4.755935781862859
- License:
- Abstract: Bootstrapping and rollout are two fundamental principles for value function estimation in reinforcement learning (RL). We introduce a novel class of Bellman operators, called subgraph Bellman operators, that interpolate between bootstrapping and rollout methods. Our estimator, derived by solving the fixed point of the empirical subgraph Bellman operator, combines the strengths of the bootstrapping-based temporal difference (TD) estimator and the rollout-based Monte Carlo (MC) methods. Specifically, the error upper bound of our estimator approaches the optimal variance achieved by TD, with an additional term depending on the exit probability of a selected subset of the state space. At the same time, the estimator exhibits the finite-sample adaptivity of MC, with sample complexity depending only on the occupancy measure of this subset. We complement the upper bound with an information-theoretic lower bound, showing that the additional term is unavoidable given a reasonable sample size. Together, these results establish subgraph Bellman estimators as an optimal and adaptive framework for reconciling TD and MC methods in policy evaluation.
Related papers
- Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach [51.76826149868971]
Policy evaluation via Monte Carlo simulation is at the core of many MC Reinforcement Learning (RL) algorithms.
We propose as a quality index a surrogate of the mean squared error of a return estimator that uses trajectories of different lengths.
We present an adaptive algorithm called Robust and Iterative Data collection strategy Optimization (RIDO)
arXiv Detail & Related papers (2024-10-17T11:47:56Z) - Maximum a Posteriori Estimation for Linear Structural Dynamics Models Using Bayesian Optimization with Rational Polynomial Chaos Expansions [0.01578888899297715]
We propose an extension to an existing sparse Bayesian learning approach for MAP estimation.
We introduce a Bayesian optimization approach, which allows to adaptively enrich the experimental design.
By combining the sparsity-inducing learning procedure with the experimental design, we effectively reduce the number of model evaluations.
arXiv Detail & Related papers (2024-08-07T06:11:37Z) - Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning [12.490614705930676]
We present a theoretical result demonstrating the strong dependency of suboptimality on the number of Monte Carlo samples taken per Bellman target calculation.
Our main contribution is a deterministic approximation to the Bellman target that uses progressive moment matching.
We show that it is possible to provide tighter guarantees for the suboptimality of MOMBO than the existing Monte Carlo sampling approaches.
arXiv Detail & Related papers (2024-06-06T13:58:41Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - Open-Set Likelihood Maximization for Few-Shot Learning [36.97433312193586]
We tackle the Few-Shot Open-Set Recognition (FSOSR) problem, i.e. classifying instances among a set of classes for which we only have a few labeled samples.
We explore the popular transductive setting, which leverages the unlabelled query instances at inference.
Motivated by the observation that existing transductive methods perform poorly in open-set scenarios, we propose a generalization of the maximum likelihood principle.
arXiv Detail & Related papers (2023-01-20T01:56:19Z) - Training Discrete Deep Generative Models via Gapped Straight-Through
Estimator [72.71398034617607]
We propose a Gapped Straight-Through ( GST) estimator to reduce the variance without incurring resampling overhead.
This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax.
Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks.
arXiv Detail & Related papers (2022-06-15T01:46:05Z) - Tractable and Near-Optimal Adversarial Algorithms for Robust Estimation
in Contaminated Gaussian Models [1.609950046042424]
Consider the problem of simultaneous estimation of location and variance matrix under Huber's contaminated Gaussian model.
First, we study minimum $f$-divergence estimation at the population level, corresponding to a generative adversarial method with a nonparametric discriminator.
We develop tractable adversarial algorithms with simple spline discriminators, which can be implemented via nested optimization.
The proposed methods are shown to achieve minimax optimal rates or near-optimal rates depending on the $f$-divergence and the penalty used.
arXiv Detail & Related papers (2021-12-24T02:46:51Z) - Robust and Adaptive Temporal-Difference Learning Using An Ensemble of
Gaussian Processes [70.80716221080118]
The paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning.
The OS-GPTD approach is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs.
To alleviate the limited expressiveness associated with a single fixed kernel, a weighted ensemble (E) of GP priors is employed to yield an alternative scheme.
arXiv Detail & Related papers (2021-12-01T23:15:09Z) - Bayesian Bellman Operators [55.959376449737405]
We introduce a novel perspective on Bayesian reinforcement learning (RL)
Our framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions.
arXiv Detail & Related papers (2021-06-09T12:20:46Z) - Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.