Q-learning with Uniformly Bounded Variance: Large Discounting is Not a
Barrier to Fast Learning
- URL: http://arxiv.org/abs/2002.10301v2
- Date: Tue, 7 Jul 2020 21:58:23 GMT
- Title: Q-learning with Uniformly Bounded Variance: Large Discounting is Not a
Barrier to Fast Learning
- Authors: Adithya M. Devraj and Sean P. Meyn
- Abstract summary: We introduce a new class of algorithms that have sample complexity uniformly bounded for all $gamma 1$.
We show that the covariance of the Q-learning algorithm with an optimized step-size sequence is a quadratic function of $1/(1-gamma)$; an expected, and essentially known result.
- Score: 7.6146285961466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sample complexity bounds are a common performance metric in the Reinforcement
Learning literature. In the discounted cost, infinite horizon setting, all of
the known bounds have a factor that is a polynomial in $1/(1-\gamma)$, where
$\gamma < 1$ is the discount factor. For a large discount factor, these bounds
seem to imply that a very large number of samples is required to achieve an
$\varepsilon$-optimal policy. The objective of the present work is to introduce
a new class of algorithms that have sample complexity uniformly bounded for all
$\gamma < 1$. One may argue that this is impossible, due to a recent min-max
lower bound. The explanation is that this previous lower bound is for a
specific problem, which we modify, without compromising the ultimate objective
of obtaining an $\varepsilon$-optimal policy. Specifically, we show that the
asymptotic covariance of the Q-learning algorithm with an optimized step-size
sequence is a quadratic function of $1/(1-\gamma)$; an expected, and
essentially known result. The new relative Q-learning algorithm proposed here
is shown to have asymptotic covariance that is a quadratic in $1/(1- \rho^*
\gamma)$, where $1 - \rho^* > 0$ is an upper bound on the spectral gap of an
optimal transition matrix.
Related papers
- Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization [53.044526424637866]
In this paper we consider finding an approximate second-order stationary point (SOSP) that minimizes a twice different subject general non conic optimization.
In particular, we propose a Newton-CG based-augmentedconjugate method for finding an approximate SOSP.
arXiv Detail & Related papers (2023-01-10T20:43:29Z) - Small Covers for Near-Zero Sets of Polynomials and Learning Latent
Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$.
Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z) - Empirical Risk Minimization in the Non-interactive Local Model of
Differential Privacy [26.69391745812235]
We study the Empirical Risk Minimization (ERM) problem in the noninteractive Local Differential Privacy (LDP) model.
Previous research indicates that the sample complexity, to achieve error $alpha, needs to be depending on dimensionality $p$ for general loss functions.
arXiv Detail & Related papers (2020-11-11T17:48:00Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Gradient-Variation Bound for Online Convex Optimization with Constraints [25.002868073267464]
We study online convex optimization with constraints consisting of multiple functional constraints and a relatively simple constraint set, such as a Euclidean ball.
First-order methods achieve an $mathcalO(sqrtT)$ regret and an $mathcalO(1)$ constraint violation, but do not take into account the structural information of the problem.
In this paper, we provide an emphinstance-dependent bound for online convex optimization with complex constraints obtained by a novel online primal-dual mirror-prox algorithm.
arXiv Detail & Related papers (2020-06-22T17:38:14Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.