A Tale of Sampling and Estimation in Discounted Reinforcement Learning
- URL: http://arxiv.org/abs/2304.05073v2
- Date: Fri, 14 Apr 2023 07:43:38 GMT
- Title: A Tale of Sampling and Estimation in Discounted Reinforcement Learning
- Authors: Alberto Maria Metelli, Mirco Mutti, Marcello Restelli
- Abstract summary: We present a minimax lower bound on the discounted mean estimation problem.
We show that estimating the mean by directly sampling from the discounted kernel of the Markov process brings compelling statistical properties.
- Score: 50.43256303670011
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The most relevant problems in discounted reinforcement learning involve
estimating the mean of a function under the stationary distribution of a Markov
reward process, such as the expected return in policy evaluation, or the policy
gradient in policy optimization. In practice, these estimates are produced
through a finite-horizon episodic sampling, which neglects the mixing
properties of the Markov process. It is mostly unclear how this mismatch
between the practical and the ideal setting affects the estimation, and the
literature lacks a formal study on the pitfalls of episodic sampling, and how
to do it optimally. In this paper, we present a minimax lower bound on the
discounted mean estimation problem that explicitly connects the estimation
error with the mixing properties of the Markov process and the discount factor.
Then, we provide a statistical analysis on a set of notable estimators and the
corresponding sampling procedures, which includes the finite-horizon estimators
often used in practice. Crucially, we show that estimating the mean by directly
sampling from the discounted kernel of the Markov process brings compelling
statistical properties w.r.t. the alternative estimators, as it matches the
lower bound without requiring a careful tuning of the episode horizon.
Related papers
- Sample-efficient neural likelihood-free Bayesian inference of implicit HMMs [1.8843687952462742]
We propose a novel, sample-efficient likelihood-free method for estimating the high-dimensional hidden states of an implicit HMM.
Our approach relies on learning directly the intractable posterior distribution of the hidden states, using an autoregressive-flow, by exploiting the Markov property.
arXiv Detail & Related papers (2024-05-02T21:13:34Z) - Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Black-box Off-policy Estimation for Infinite-Horizon Reinforcement
Learning [26.880437279977155]
Off-policy estimation for long-horizon problems is important in many real-life applications such as healthcare and robotics.
We develop a new estimator that computes importance ratios of stationary distributions without knowledge of how the off-policy data are collected.
arXiv Detail & Related papers (2020-03-24T21:44:51Z) - Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation.
Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z) - On Low-rank Trace Regression under General Sampling Distribution [9.699586426043885]
We show that cross-validated estimators satisfy near-optimal error bounds on general assumptions.
We also show that the cross-validated estimator outperforms the theory-inspired approach of selecting the parameter.
arXiv Detail & Related papers (2019-04-18T02:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.