A Convergent and Efficient Deep Q Network Algorithm
- URL: http://arxiv.org/abs/2106.15419v1
- Date: Tue, 29 Jun 2021 13:38:59 GMT
- Title: A Convergent and Efficient Deep Q Network Algorithm
- Authors: Zhikang T. Wang, Masahito Ueda
- Abstract summary: We show that the deep Q network (DQN) reinforcement learning algorithm can diverge and cease to operate in realistic settings.
We propose a convergent DQN algorithm (C-DQN) by carefully modifying DQN.
It learns robustly in difficult settings and can learn several difficult games in the Atari 2600 benchmark.
- Score: 3.553493344868414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the empirical success of the deep Q network (DQN) reinforcement
learning algorithm and its variants, DQN is still not well understood and it
does not guarantee convergence. In this work, we show that DQN can diverge and
cease to operate in realistic settings. Although there exist gradient-based
convergent methods, we show that they actually have inherent problems in
learning behaviour and elucidate why they often fail in practice. To overcome
these problems, we propose a convergent DQN algorithm (C-DQN) by carefully
modifying DQN, and we show that the algorithm is convergent and can work with
large discount factors (0.9998). It learns robustly in difficult settings and
can learn several difficult games in the Atari 2600 benchmark where DQN fail,
within a moderate computational budget. Our codes have been publicly released
and can be used to reproduce our results.
Related papers
- Weakly Coupled Deep Q-Networks [5.76924666595801]
We propose a novel deep reinforcement learning algorithm that enhances performance in weakly coupled Markov decision processes (WCMDP)
WCDQN employs a single network to train multiple DQN "subagents", one for each subproblem, and then combine their solutions to establish an upper bound on the optimal action value.
arXiv Detail & Related papers (2023-10-28T20:07:57Z) - On the Convergence and Sample Complexity Analysis of Deep Q-Networks
with $\epsilon$-Greedy Exploration [86.71396285956044]
This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $varepsilon$-greedy exploration in deep reinforcement learning.
arXiv Detail & Related papers (2023-10-24T20:37:02Z) - Quantum Imitation Learning [74.15588381240795]
We propose quantum imitation learning (QIL) with a hope to utilize quantum advantage to speed up IL.
We develop two QIL algorithms, quantum behavioural cloning (Q-BC) and quantum generative adversarial imitation learning (Q-GAIL)
Experiment results demonstrate that both Q-BC and Q-GAIL can achieve comparable performance compared to classical counterparts.
arXiv Detail & Related papers (2023-04-04T12:47:35Z) - Hardness of Independent Learning and Sparse Equilibrium Computation in
Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games.
We show that no algorithm attains no-regret in general-sum games when executed independently by all players.
We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z) - Control of Continuous Quantum Systems with Many Degrees of Freedom based
on Convergent Reinforcement Learning [1.8710230264817362]
In this dissertation, we investigate the non-convergence issue of Q-learning.
We develop a new convergent Q-learning algorithm, which we call the convergent deep Q network (C-DQN) algorithm.
We prove the convergence of C-DQN and apply it to the Atari 2600 benchmark.
arXiv Detail & Related papers (2022-12-21T00:52:43Z) - Interpretable Option Discovery using Deep Q-Learning and Variational
Autoencoders [9.432068833600884]
The DVQN algorithm is a promising approach for identifying initiation and termination conditions for option-based reinforcement learning.
Experiments show that the DVQN algorithm, with automatic initiation and termination, has comparable performance to Rainbow.
arXiv Detail & Related papers (2022-10-03T21:08:39Z) - M$^2$DQN: A Robust Method for Accelerating Deep Q-learning Network [6.689964384669018]
We propose a framework which uses the Max-Mean loss in Deep Q-Network (M$2$DQN)
Instead of sampling one batch of experiences in the training step, we sample several batches from the experience replay and update the parameters such as the maximum TD-error of these batches is minimized.
We verify the effectiveness of this framework with one of the most widely used techniques, Double DQN (DDQN) in several gym games.
arXiv Detail & Related papers (2022-09-16T09:20:35Z) - Does DQN Learn? [16.035744751431114]
We show that the widely used Deep Q-Network (DQN) fails to meet even this basic criterion.
We numerically show that DQN generally has a non-trivial probability of producing a policy worse than the initial one.
arXiv Detail & Related papers (2022-05-26T20:46:01Z) - Toward Trainability of Deep Quantum Neural Networks [87.04438831673063]
Quantum Neural Networks (QNNs) with random structures have poor trainability due to the exponentially vanishing gradient as the circuit depth and the qubit number increase.
We provide the first viable solution to the vanishing gradient problem for deep QNNs with theoretical guarantees.
arXiv Detail & Related papers (2021-12-30T10:27:08Z) - MQBench: Towards Reproducible and Deployable Model Quantization
Benchmark [53.12623958951738]
MQBench is a first attempt to evaluate, analyze, and benchmark the and deployability for model quantization algorithms.
We choose multiple platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms.
We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights.
arXiv Detail & Related papers (2021-11-05T23:38:44Z) - Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.
We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.