GenCos' Behaviors Modeling Based on Q Learning Improved by Dichotomy
- URL: http://arxiv.org/abs/2008.01536v1
- Date: Tue, 4 Aug 2020 13:48:09 GMT
- Title: GenCos' Behaviors Modeling Based on Q Learning Improved by Dichotomy
- Authors: Qiangang Jia, Zhaoyu Hu, Yiyan Li, Zheng Yan, Sijie Chen
- Abstract summary: A novel Q learning algorithm is proposed in this paper.
It modifies the update process of the Q table by dichotomizing the state space and the action space step by step.
Simulation results in a repeated Cournot game show the effectiveness of the proposed algorithm.
- Score: 3.14969586104215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Q learning is widely used to simulate the behaviors of generation companies
(GenCos) in an electricity market. However, existing Q learning method usually
requires numerous iterations to converge, which is time-consuming and
inefficient in practice. To enhance the calculation efficiency, a novel Q
learning algorithm improved by dichotomy is proposed in this paper. This method
modifies the update process of the Q table by dichotomizing the state space and
the action space step by step. Simulation results in a repeated Cournot game
show the effectiveness of the proposed algorithm.
Related papers
- SPEQ: Stabilization Phases for Efficient Q-Learning in High Update-To-Data Ratio Reinforcement Learning [51.10866035483686]
Recent off-policy algorithms improve sample efficiency by increasing the Update-To-Data ratio and performing more gradient updates per environment interaction.
While this improves sample efficiency, it significantly increases computational cost due to the higher number of gradient updates required.
We propose a sample-efficient method to improve computational efficiency by separating training into distinct learning phases.
arXiv Detail & Related papers (2025-01-15T09:04:19Z) - Preventing Local Pitfalls in Vector Quantization via Optimal Transport [77.15924044466976]
We introduce OptVQ, a novel vector quantization method that employs the Sinkhorn algorithm to optimize the optimal transport problem.
Our experiments on image reconstruction tasks demonstrate that OptVQ achieves 100% codebook utilization and surpasses current state-of-the-art VQNs in reconstruction quality.
arXiv Detail & Related papers (2024-12-19T18:58:14Z) - Two-Step Q-Learning [0.0]
The paper proposes a novel off-policy two-step Q-learning algorithm, without importance sampling.
Numerical experiments demonstrate the superior performance of both the two-step Q-learning and its smooth variants.
arXiv Detail & Related papers (2024-07-02T15:39:00Z) - Online Target Q-learning with Reverse Experience Replay: Efficiently
finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results.
We present novel methods Q-Rex and Q-RexDaRe.
We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z) - Smooth Q-learning: Accelerate Convergence of Q-learning Using Similarity [2.088376060651494]
The similarity between different states and actions is considered in the proposed method.
During the training, a new updating mechanism is used, in which the Q value of the similar state-action pairs are updated synchronously.
arXiv Detail & Related papers (2021-06-02T13:05:24Z) - Self-correcting Q-Learning [14.178899938667161]
We introduce a new way to address the bias in the form of a "self-correcting algorithm"
Applying this strategy to Q-learning results in Self-correcting Q-learning.
We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate.
arXiv Detail & Related papers (2020-12-02T11:36:24Z) - Variance Reduction for Deep Q-Learning using Stochastic Recursive
Gradient [51.880464915253924]
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance.
This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
arXiv Detail & Related papers (2020-07-25T00:54:20Z) - Analysis of Q-learning with Adaptation and Momentum Restart for Gradient
Descent [47.3692506462581]
We first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update.
To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm.
Our experiments on a linear quadratic regulator problem show that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates.
arXiv Detail & Related papers (2020-07-15T01:11:43Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z) - Model-based Multi-Agent Reinforcement Learning with Cooperative
Prioritized Sweeping [4.5497948012757865]
We present a new model-based reinforcement learning algorithm, Cooperative Prioritized Sweeping.
The algorithm allows for sample-efficient learning on large problems by exploiting a factorization to approximate the value function.
Our method outperforms the state-of-the-art algorithm sparse cooperative Q-learning algorithm, both on the well-known SysAdmin benchmark and randomized environments.
arXiv Detail & Related papers (2020-01-15T19:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.