An Elementary Proof that Q-learning Converges Almost Surely
- URL: http://arxiv.org/abs/2108.02827v1
- Date: Thu, 5 Aug 2021 19:32:26 GMT
- Title: An Elementary Proof that Q-learning Converges Almost Surely
- Authors: Matthew T. Regehr, Alex Ayoub
- Abstract summary: Watkins' and Dayan's Q-learning is a model-free reinforcement learning algorithm.
This paper reproduces a precise and (nearly) self-contained proof that Q-learning converges.
- Score: 1.52292571922932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Watkins' and Dayan's Q-learning is a model-free reinforcement learning
algorithm that iteratively refines an estimate for the optimal action-value
function of an MDP by stochastically "visiting" many state-ation pairs [Watkins
and Dayan, 1992]. Variants of the algorithm lie at the heart of numerous recent
state-of-the-art achievements in reinforcement learning, including the
superhuman Atari-playing deep Q-network [Mnih et al., 2015]. The goal of this
paper is to reproduce a precise and (nearly) self-contained proof that
Q-learning converges. Much of the available literature leverages powerful
theory to obtain highly generalizable results in this vein. However, this
approach requires the reader to be familiar with and make many deep connections
to different research areas. A student seeking to deepen their understand of
Q-learning risks becoming caught in a vicious cycle of "RL-learning Hell". For
this reason, we give a complete proof from start to finish using only one
external result from the field of stochastic approximation, despite the fact
that this minimal dependence on other results comes at the expense of some
"shininess".
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.