M$^2$DQN: A Robust Method for Accelerating Deep Q-learning Network
- URL: http://arxiv.org/abs/2209.07809v1
- Date: Fri, 16 Sep 2022 09:20:35 GMT
- Title: M$^2$DQN: A Robust Method for Accelerating Deep Q-learning Network
- Authors: Zhe Zhang, Yukun Zou, Junjie Lai, Qing Xu
- Abstract summary: We propose a framework which uses the Max-Mean loss in Deep Q-Network (M$2$DQN)
Instead of sampling one batch of experiences in the training step, we sample several batches from the experience replay and update the parameters such as the maximum TD-error of these batches is minimized.
We verify the effectiveness of this framework with one of the most widely used techniques, Double DQN (DDQN) in several gym games.
- Score: 6.689964384669018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Q-learning Network (DQN) is a successful way which combines
reinforcement learning with deep neural networks and leads to a widespread
application of reinforcement learning. One challenging problem when applying
DQN or other reinforcement learning algorithms to real world problem is data
collection. Therefore, how to improve data efficiency is one of the most
important problems in the research of reinforcement learning. In this paper, we
propose a framework which uses the Max-Mean loss in Deep Q-Network (M$^2$DQN).
Instead of sampling one batch of experiences in the training step, we sample
several batches from the experience replay and update the parameters such that
the maximum TD-error of these batches is minimized. The proposed method can be
combined with most of existing techniques of DQN algorithm by replacing the
loss function. We verify the effectiveness of this framework with one of the
most widely used techniques, Double DQN (DDQN), in several gym games. The
results show that our method leads to a substantial improvement in both the
learning speed and performance.
Related papers
- Simplifying Deep Temporal Difference Learning [3.458933902627673]
We investigate whether it is possible to accelerate and simplify TD training while maintaining its stability.
Our key theoretical result demonstrates for the first time that regularisation techniques such as LayerNorm can yield provably convergent TD algorithms.
Motivated by these findings, we propose PQN, our simplified deep online Q-Learning algorithm.
arXiv Detail & Related papers (2024-07-05T18:49:07Z) - Learning RL-Policies for Joint Beamforming Without Exploration: A Batch
Constrained Off-Policy Approach [1.0080317855851213]
We consider the problem of network parameter cancellation optimization for networks.
We show that deploying an algorithm in the real world for exploration and learning can be achieved with the data without exploring.
arXiv Detail & Related papers (2023-10-12T18:36:36Z) - Learning to Optimize Permutation Flow Shop Scheduling via Graph-based
Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems.
We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately.
Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - Self-correcting Q-Learning [14.178899938667161]
We introduce a new way to address the bias in the form of a "self-correcting algorithm"
Applying this strategy to Q-learning results in Self-correcting Q-learning.
We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate.
arXiv Detail & Related papers (2020-12-02T11:36:24Z) - Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident.
In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z) - Solving Sparse Linear Inverse Problems in Communication Systems: A Deep
Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems.
The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z) - Hindsight Experience Replay with Kronecker Product Approximate Curvature [5.441932327359051]
Hindsight Experience Replay (HER) is one of the efficient algorithm to solve Reinforcement Learning tasks.
But due to its reduced sample efficiency and slower convergence HER fails to perform effectively.
Natural gradients solves these challenges by converging the model parameters better.
Our proposed method solves the above mentioned challenges with better sample efficiency and faster convergence with increased success rate.
arXiv Detail & Related papers (2020-10-09T20:25:14Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z) - An adaptive synchronization approach for weights of deep reinforcement
learning [2.132096006921048]
Deep Q-Networks (DQN) is one of the most well-known methods of deep reinforcement learning.
synchronizing the network weight in a fixed step size, independent of the agent's behavior, may in some cases cause the loss of some properly learned networks.
We propose an adaptive approach for the synchronization of the neural weights used in DQN.
arXiv Detail & Related papers (2020-08-16T18:49:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.