Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis
- URL: http://arxiv.org/abs/2003.07337v1
- Date: Mon, 16 Mar 2020 17:15:28 GMT
- Title: Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis
- Authors: Koulik Khamaru, Ashwin Pananjady, Feng Ruan, Martin J. Wainwright,
Michael I. Jordan
- Abstract summary: We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
- Score: 102.29671176698373
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of policy evaluation in discounted Markov decision
processes, and provide instance-dependent guarantees on the $\ell_\infty$-error
under a generative model. We establish both asymptotic and non-asymptotic
versions of local minimax lower bounds for policy evaluation, thereby providing
an instance-dependent baseline by which to compare algorithms. Theory-inspired
simulations show that the widely-used temporal difference (TD) algorithm is
strictly suboptimal when evaluated in a non-asymptotic setting, even when
combined with Polyak-Ruppert iterate averaging. We remedy this issue by
introducing and analyzing variance-reduced forms of stochastic approximation,
showing that they achieve non-asymptotic, instance-dependent optimality up to
logarithmic factors.
Related papers
- Improved High-Probability Bounds for the Temporal Difference Learning Algorithm via Exponential Stability [17.771354881467435]
We show that a simple algorithm with a universal and instance-independent step size is sufficient to obtain near-optimal variance and bias terms.
Our proof technique is based on refined error bounds for linear approximation together with the novel stability result for the product of random matrices.
arXiv Detail & Related papers (2023-10-22T12:37:25Z) - Instance-Optimality in Interactive Decision Making: Toward a
Non-Asymptotic Theory [30.061707627742766]
We aim for instance-optimality, a strong notion of adaptivity which asserts that, on any particular problem instance, the algorithm under consideration outperforms all consistent algorithms.
In this paper, we take the first step toward developing a non-asymptotic theory of instance-optimal decision making with general function approximation.
arXiv Detail & Related papers (2023-04-24T21:51:58Z) - Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response.
We construct unbiased estimators for the policy-dependent estimand by a perturbation method.
We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Accelerated and instance-optimal policy evaluation with linear function
approximation [17.995515643150657]
Existing algorithms fail to match at least one of these lower bounds.
We develop an accelerated, variance-reduced fast temporal difference algorithm that simultaneously matches both lower bounds and attains a strong notion of instance-optimality.
arXiv Detail & Related papers (2021-12-24T17:21:04Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.