Target Network and Truncation Overcome The Deadly triad in $Q$-Learning
- URL: http://arxiv.org/abs/2203.02628v1
- Date: Sat, 5 Mar 2022 00:54:58 GMT
- Title: Target Network and Truncation Overcome The Deadly triad in $Q$-Learning
- Authors: Zaiwei Chen, John Paul Clarke, and Siva Theja Maguluri
- Abstract summary: We propose a stable design for $Q$-learning with linear function approximation using target network and truncation.
Our result implies an $mathcalO(epsilon-2)$ sample complexity up to a function approximation error.
This is the first variant of $Q$-learning with linear function approximation that is provably stable without requiring strong assumptions or modifying the problem parameters.
- Score: 7.532013242448151
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: $Q$-learning with function approximation is one of the most empirically
successful while theoretically mysterious reinforcement learning (RL)
algorithms, and was identified in Sutton (1999) as one of the most important
theoretical open problems in the RL community. Even in the basic linear
function approximation setting, there are well-known divergent examples. In
this work, we propose a stable design for $Q$-learning with linear function
approximation using target network and truncation, and establish its
finite-sample guarantees. Our result implies an $\mathcal{O}(\epsilon^{-2})$
sample complexity up to a function approximation error. This is the first
variant of $Q$-learning with linear function approximation that is provably
stable without requiring strong assumptions or modifying the problem
parameters, and achieves the optimal sample complexity.
Related papers
- A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning
with General Function Approximation [66.26739783789387]
We propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for reinforcement learning.
MQL-UCB achieves minimax optimal regret of $tildeO(dsqrtHK)$ when $K$ is sufficiently large and near-optimal policy switching cost.
Our work sheds light on designing provably sample-efficient and deployment-efficient Q-learning with nonlinear function approximation.
arXiv Detail & Related papers (2023-11-26T08:31:57Z) - On the Model-Misspecification in Reinforcement Learning [9.864462523050843]
We present a unified theoretical framework for addressing model misspecification in reinforcement learning.
We show that value-based and model-based methods can achieve robustness under local misspecification error bounds.
We also propose an algorithmic framework that can achieve the same order of regret bound without prior knowledge of $zeta$.
arXiv Detail & Related papers (2023-06-19T04:31:59Z) - Provably Efficient Reinforcement Learning via Surprise Bound [66.15308700413814]
We propose a provably efficient reinforcement learning algorithm (both computationally and statistically) with general value function approximations.
Our algorithm achieves reasonable regret bounds when applied to both the linear setting and the sparse high-dimensional linear setting.
arXiv Detail & Related papers (2023-02-22T20:21:25Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement
Learning? [30.065091907118827]
This work considers sample complexity of finding an $epsilon$-optimal policy in a Markov decision process (MDP)
We solve this problem via a plug-in solver approach, which builds an empirical model and plans in this empirical model via an arbitrary plug-in solver.
We show that a plug-in approach can be sample efficient as well, providing a flexible approach to design model-based algorithms for reinforcement learning.
arXiv Detail & Related papers (2020-10-12T13:13:01Z) - Reinforcement Learning with General Value Function Approximation:
Provably Efficient Approach via Bounded Eluder Dimension [124.7752517531109]
We establish a provably efficient reinforcement learning algorithm with general value function approximation.
We show that our algorithm achieves a regret bound of $widetildeO(mathrmpoly(dH)sqrtT)$ where $d$ is a complexity measure.
Our theory generalizes recent progress on RL with linear value function approximation and does not make explicit assumptions on the model of the environment.
arXiv Detail & Related papers (2020-05-21T17:36:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.