Smooth Q-learning: Accelerate Convergence of Q-learning Using Similarity
- URL: http://arxiv.org/abs/2106.01134v1
- Date: Wed, 2 Jun 2021 13:05:24 GMT
- Title: Smooth Q-learning: Accelerate Convergence of Q-learning Using Similarity
- Authors: Wei Liao and Xiaohui Wei and Jizhou Lai
- Abstract summary: The similarity between different states and actions is considered in the proposed method.
During the training, a new updating mechanism is used, in which the Q value of the similar state-action pairs are updated synchronously.
- Score: 2.088376060651494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An improvement of Q-learning is proposed in this paper. It is different from
classic Q-learning in that the similarity between different states and actions
is considered in the proposed method. During the training, a new updating
mechanism is used, in which the Q value of the similar state-action pairs are
updated synchronously. The proposed method can be used in combination with both
tabular Q-learning function and deep Q-learning. And the results of numerical
examples illustrate that compared to the classic Q-learning, the proposed
method has a significantly better performance.
Related papers
- Two-Step Q-Learning [0.0]
The paper proposes a novel off-policy two-step Q-learning algorithm, without importance sampling.
Numerical experiments demonstrate the superior performance of both the two-step Q-learning and its smooth variants.
arXiv Detail & Related papers (2024-07-02T15:39:00Z) - Suppressing Overestimation in Q-Learning through Adversarial Behaviors [4.36117236405564]
This paper proposes a new Q-learning algorithm with a dummy adversarial player, which is called dummy adversarial Q-learning (DAQ)
The proposed DAQ unifies several Q-learning variations to control overestimation biases, such as maxmin Q-learning and minmax Q-learning.
A finite-time convergence of DAQ is analyzed from an integrated perspective by adapting an adversarial Q-learning.
arXiv Detail & Related papers (2023-10-10T03:46:32Z) - VA-learning as a more efficient alternative to Q-learning [49.526579981437315]
We introduce VA-learning, which directly learns advantage function and value function using bootstrapping.
VA-learning learns off-policy and enjoys similar theoretical guarantees as Q-learning.
Thanks to the direct learning of advantage function and value function, VA-learning improves the sample efficiency over Q-learning.
arXiv Detail & Related papers (2023-05-29T15:44:47Z) - Convergence Results For Q-Learning With Experience Replay [51.11953997546418]
We provide a convergence rate guarantee, and discuss how it compares to the convergence of Q-learning depending on important parameters such as the frequency and number of iterations of replay.
We also provide theoretical evidence showing when we might expect this to strictly improve performance, by introducing and analyzing a simple class of MDPs.
arXiv Detail & Related papers (2021-12-08T10:22:49Z) - Online Target Q-learning with Reverse Experience Replay: Efficiently
finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results.
We present novel methods Q-Rex and Q-RexDaRe.
We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z) - GenCos' Behaviors Modeling Based on Q Learning Improved by Dichotomy [3.14969586104215]
A novel Q learning algorithm is proposed in this paper.
It modifies the update process of the Q table by dichotomizing the state space and the action space step by step.
Simulation results in a repeated Cournot game show the effectiveness of the proposed algorithm.
arXiv Detail & Related papers (2020-08-04T13:48:09Z) - Momentum Q-learning with Finite-Sample Convergence Guarantee [49.38471009162477]
This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee.
We establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling.
We demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
arXiv Detail & Related papers (2020-07-30T12:27:03Z) - Analysis of Q-learning with Adaptation and Momentum Restart for Gradient
Descent [47.3692506462581]
We first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update.
To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm.
Our experiments on a linear quadratic regulator problem show that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates.
arXiv Detail & Related papers (2020-07-15T01:11:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.