Tightening the Dependence on Horizon in the Sample Complexity of
Q-Learning
- URL: http://arxiv.org/abs/2102.06548v1
- Date: Fri, 12 Feb 2021 14:22:05 GMT
- Title: Tightening the Dependence on Horizon in the Sample Complexity of
Q-Learning
- Authors: Gen Li, Changxiao Cai, Yuxin Chen, Yuantao Gu, Yuting Wei, Yuejie Chi
- Abstract summary: This work sharpens the sample complexity of synchronous Q-learning to an order of $frac|mathcalS|| (1-gamma)4varepsilon2$ for any $0varepsilon 1$.
Our finding unveils the effectiveness of vanilla Q-learning, which matches that of speedy Q-learning without requiring extra computation and storage.
- Score: 59.71676469100807
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Q-learning, which seeks to learn the optimal Q-function of a Markov decision
process (MDP) in a model-free fashion, lies at the heart of reinforcement
learning. When it comes to the synchronous setting (such that independent
samples for all state-action pairs are drawn from a generative model in each
iteration), substantial progress has been made recently towards understanding
the sample efficiency of Q-learning. To yield an entrywise
$\varepsilon$-accurate estimate of the optimal Q-function, state-of-the-art
theory requires at least an order of
$\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^5\varepsilon^{2}}$ samples for a
$\gamma$-discounted infinite-horizon MDP with state space $\mathcal{S}$ and
action space $\mathcal{A}$. In this work, we sharpen the sample complexity of
synchronous Q-learning to an order of
$\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^4\varepsilon^2}$ (up to some
logarithmic factor) for any $0<\varepsilon <1$, leading to an order-wise
improvement in terms of the effective horizon $\frac{1}{1-\gamma}$. Analogous
results are derived for finite-horizon MDPs as well. Our finding unveils the
effectiveness of vanilla Q-learning, which matches that of speedy Q-learning
without requiring extra computation and storage. A key ingredient of our
analysis lies in the establishment of novel error decompositions and
recursions, which might shed light on how to analyze finite-sample performance
of other Q-learning variants.
Related papers
- A finite time analysis of distributed Q-learning [6.663174194579773]
Multi-agent reinforcement learning (MARL) has witnessed a remarkable surge in interest, fueled by the empirical success achieved in applications of single-agent reinforcement learning (RL)
In this study, we consider a distributed Q-learning scenario, wherein a number of agents cooperatively solve a sequential decision making problem without access to the central reward function which is an average of the local rewards.
arXiv Detail & Related papers (2024-05-23T00:52:38Z) - Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Sample Complexity of Variance-reduced Distributionally Robust Q-learning [17.96094201655567]
This paper presents two novel model-free algorithms, namely the distributionally robust Q-learning and its variance-reduced counterpart, that can effectively learn a robust policy despite distributional shifts.
A series of numerical experiments confirm the theoretical findings and the efficiency of the algorithms in handling distributional shifts.
arXiv Detail & Related papers (2023-05-28T19:40:46Z) - A Finite Sample Complexity Bound for Distributionally Robust Q-learning [17.96094201655567]
We consider a reinforcement learning setting in which the deployment environment is different from the training environment.
Applying a robust Markov decision processes formulation, we extend the distributionally robust $Q$-learning framework studied in Liu et al.
This is the first sample complexity result for the model-free robust RL problem.
arXiv Detail & Related papers (2023-02-26T01:15:32Z) - Settling the Sample Complexity of Model-Based Offline Reinforcement
Learning [50.5790774201146]
offline reinforcement learning (RL) learns using pre-collected data without further exploration.
Prior algorithms or analyses either suffer from suboptimal sample complexities or incur high burn-in cost to reach sample optimality.
We demonstrate that the model-based (or "plug-in") approach achieves minimax-optimal sample complexity without burn-in cost.
arXiv Detail & Related papers (2022-04-11T17:26:19Z) - Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning.
We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.