Periodic Q-Learning
- URL: http://arxiv.org/abs/2002.09795v1
- Date: Sun, 23 Feb 2020 00:33:13 GMT
- Title: Periodic Q-Learning
- Authors: Donghwan Lee and Niao He
- Abstract summary: We study the so-called periodic Q-learning algorithm (PQ-learning for short)
PQ-learning maintains two separate Q-value estimates - the online estimate and target estimate.
In contrast to the standard Q-learning, PQ-learning enjoys a simple finite time analysis and achieves better sample for finding an epsilon-optimal policy.
- Score: 24.099046883918046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of target networks is a common practice in deep reinforcement
learning for stabilizing the training; however, theoretical understanding of
this technique is still limited. In this paper, we study the so-called periodic
Q-learning algorithm (PQ-learning for short), which resembles the technique
used in deep Q-learning for solving infinite-horizon discounted Markov decision
processes (DMDP) in the tabular setting. PQ-learning maintains two separate
Q-value estimates - the online estimate and target estimate. The online
estimate follows the standard Q-learning update, while the target estimate is
updated periodically. In contrast to the standard Q-learning, PQ-learning
enjoys a simple finite time analysis and achieves better sample complexity for
finding an epsilon-optimal policy. Our result provides a preliminary
justification of the effectiveness of utilizing target estimates or networks in
Q-learning algorithms.
Related papers
- Regularized Q-learning through Robust Averaging [3.4354636842203026]
We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner.
One such weakness is an underlying estimation bias which cannot be controlled and often results in poor performance.
We show that 2RA Q-learning converges to the optimal policy and analyze its theoretical mean-squared error.
arXiv Detail & Related papers (2024-05-03T15:57:26Z) - Finite-Time Error Analysis of Soft Q-Learning: Switching System Approach [4.36117236405564]
Soft Q-learning is a variation of Q-learning designed to solve entropy regularized Markov decision problems.
This paper aims to offer a novel and unified finite-time, control-theoretic analysis of soft Q-learning algorithms.
arXiv Detail & Related papers (2024-03-11T01:36:37Z) - A Perspective of Q-value Estimation on Offline-to-Online Reinforcement
Learning [54.48409201256968]
offline-to-online Reinforcement Learning (O2O RL) aims to improve the performance of offline pretrained policy using only a few online samples.
Most O2O methods focus on the balance between RL objective and pessimism, or the utilization of offline and online samples.
arXiv Detail & Related papers (2023-12-12T19:24:35Z) - Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z) - Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates [110.92598350897192]
Q-Learning has proven effective at learning a policy to perform control tasks.
estimation noise becomes a bias after the max operator in the policy improvement step.
We present Unbiased Soft Q-Learning (UQL), which extends the work of EQL from two action, finite state spaces to multi-action, infinite state Markov Decision Processes.
arXiv Detail & Related papers (2021-10-28T00:07:19Z) - Online Target Q-learning with Reverse Experience Replay: Efficiently
finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results.
We present novel methods Q-Rex and Q-RexDaRe.
We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z) - Expert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples [8.938418994111716]
Expert Q-learning is inspired by Dueling Q-learning and aims at incorporating semi-supervised learning into reinforcement learning.
An offline expert assesses the value of a state in a coarse manner using three discrete values.
Our results show that Expert Q-learning is indeed useful and more resistant to the overestimation bias.
arXiv Detail & Related papers (2021-06-28T12:41:45Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z) - Q-Learning with Differential Entropy of Q-Tables [4.221871357181261]
We conjecture that the reduction in performance during prolonged training sessions of Q-learning is caused by a loss of information.
We introduce Differential Entropy of Q-tables (DE-QT) as an external information loss detector to the Q-learning algorithm.
arXiv Detail & Related papers (2020-06-26T04:37:10Z) - Conservative Q-Learning for Offline Reinforcement Learning [106.05582605650932]
We show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return.
We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
arXiv Detail & Related papers (2020-06-08T17:53:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.