Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning
- URL: http://arxiv.org/abs/2106.14352v1
- Date: Mon, 28 Jun 2021 00:38:54 GMT
- Title: Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning
- Authors: Koulik Khamaru, Eric Xia, Martin J. Wainwright, and Michael I. Jordan
- Abstract summary: We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions.
Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure.
In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
- Score: 99.34907092347733
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Various algorithms in reinforcement learning exhibit dramatic variability in
their convergence rates and ultimate accuracy as a function of the problem
structure. Such instance-specific behavior is not captured by existing global
minimax bounds, which are worst-case in nature. We analyze the problem of
estimating optimal $Q$-value functions for a discounted Markov decision process
with discrete states and actions and identify an instance-dependent functional
that controls the difficulty of estimation in the $\ell_\infty$-norm. Using a
local minimax framework, we show that this functional arises in lower bounds on
the accuracy on any estimation procedure. In the other direction, we establish
the sharpness of our lower bounds, up to factors logarithmic in the state and
action spaces, by analyzing a variance-reduced version of $Q$-learning. Our
theory provides a precise way of distinguishing "easy" problems from "hard"
ones in the context of $Q$-learning, as illustrated by an ensemble with a
continuum of difficulty.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Stochastic Optimization with Constraints: A Non-asymptotic Instance-Dependent Analysis [2.1756081703276]
We analyze the behavior of a natural variance reduced proximal gradient (VRPG) algorithm for convex optimization under convex constraints.
Our main result is a non-asymptotic guarantee for VRPG algorithm.
We show that our guarantee captures the complexity of the loss function, the variability of the noise, and the geometry of the constraint set.
arXiv Detail & Related papers (2024-03-24T14:45:11Z) - Improved Sample Complexity Bounds for Distributionally Robust
Reinforcement Learning [3.222802562733787]
We consider the problem of learning a control policy that is robust against the parameter mismatches between the training environment and testing environment.
We propose the Robust Phased Value Learning (RPVL) algorithm to solve this problem for the uncertainty sets specified by four different divergences.
We show that our algorithm achieves $tildemathcalO(|mathcalSmathcalA| H5)$ sample complexity, which is uniformly better than the existing results.
arXiv Detail & Related papers (2023-03-05T21:47:08Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - A framework for bilevel optimization that enables stochastic and global
variance reduction algorithms [17.12280360174073]
Bilevel optimization is a problem of minimizing a value function which involves the arg-minimum of another function.
We introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time.
We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption.
arXiv Detail & Related papers (2022-01-31T18:17:25Z) - Accelerated and instance-optimal policy evaluation with linear function
approximation [17.995515643150657]
Existing algorithms fail to match at least one of these lower bounds.
We develop an accelerated, variance-reduced fast temporal difference algorithm that simultaneously matches both lower bounds and attains a strong notion of instance-optimality.
arXiv Detail & Related papers (2021-12-24T17:21:04Z) - Isotonic regression with unknown permutations: Statistics, computation,
and adaptation [13.96377843988598]
We study the minimax risk of estimation (in empirical $L$ loss) and the fundamental limits of adaptation (quantified by the adaptivity index)
We provide a Mirsky partition estimator that is minimax optimal while also achieving the smallest adaptivity index possible for vanilla time procedures.
In a complementary direction, we show that natural modifications of existing estimators fail to satisfy at least one of the desiderata optimal worst-case statistical performance, computational efficiency, and fast adaptation.
arXiv Detail & Related papers (2020-09-05T22:17:51Z) - Relative Pose Estimation of Calibrated Cameras with Known
$\mathrm{SE}(3)$ Invariants [65.2314683780204]
We present a complete study of the relative pose estimation problem for a camera constrained by known $mathrmSE(3)$ invariants.
These problems reduces the minimal number of point pairs for relative pose estimation.
Experiments on synthetic and real data shows performance improvement compared to conventional relative pose estimation methods.
arXiv Detail & Related papers (2020-07-15T13:55:55Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.