Related papers: M$^2$DQN: A Robust Method for Accelerating Deep Q-learning Network

M$^2$DQN: A Robust Method for Accelerating Deep Q-learning Network

URL: http://arxiv.org/abs/2209.07809v1
Date: Fri, 16 Sep 2022 09:20:35 GMT
Title: M$^2$DQN: A Robust Method for Accelerating Deep Q-learning Network
Authors: Zhe Zhang, Yukun Zou, Junjie Lai, Qing Xu
Abstract summary: We propose a framework which uses the Max-Mean loss in Deep Q-Network (M$2$DQN) Instead of sampling one batch of experiences in the training step, we sample several batches from the experience replay and update the parameters such as the maximum TD-error of these batches is minimized. We verify the effectiveness of this framework with one of the most widely used techniques, Double DQN (DDQN) in several gym games.
Score: 6.689964384669018
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Q-learning Network (DQN) is a successful way which combines reinforcement learning with deep neural networks and leads to a widespread application of reinforcement learning. One challenging problem when applying DQN or other reinforcement learning algorithms to real world problem is data collection. Therefore, how to improve data efficiency is one of the most important problems in the research of reinforcement learning. In this paper, we propose a framework which uses the Max-Mean loss in Deep Q-Network (M$^2$DQN). Instead of sampling one batch of experiences in the training step, we sample several batches from the experience replay and update the parameters such that the maximum TD-error of these batches is minimized. The proposed method can be combined with most of existing techniques of DQN algorithm by replacing the loss function. We verify the effectiveness of this framework with one of the most widely used techniques, Double DQN (DDQN), in several gym games. The results show that our method leads to a substantial improvement in both the learning speed and performance.

Related papers

Simplifying Deep Temporal Difference Learning [3.458933902627673]
We investigate whether it is possible to accelerate and simplify TD training while maintaining its stability. Our key theoretical result demonstrates for the first time that regularisation techniques such as LayerNorm can yield provably convergent TD algorithms. Motivated by these findings, we propose PQN, our simplified deep online Q-Learning algorithm.
arXiv Detail & Related papers (2024-07-05T18:49:07Z)
Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy Approach [1.0080317855851213]
We consider the problem of network parameter cancellation optimization for networks. We show that deploying an algorithm in the real world for exploration and learning can be achieved with the data without exploring.
arXiv Detail & Related papers (2023-10-12T18:36:36Z)
Learning to Optimize Permutation Flow Shop Scheduling via Graph-based Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems. We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately. Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z)
Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context. We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z)
Decoupled and Memory-Reinforced Networks: Towards Effective Feature Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network. There are two major challenges in the current one-step approaches. We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z)
Self-correcting Q-Learning [14.178899938667161]
We introduce a new way to address the bias in the form of a "self-correcting algorithm" Applying this strategy to Q-learning results in Self-correcting Q-learning. We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate.
arXiv Detail & Related papers (2020-12-02T11:36:24Z)
Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident. In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z)
Solving Sparse Linear Inverse Problems in Communication Systems: A Deep Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems. The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z)
Hindsight Experience Replay with Kronecker Product Approximate Curvature [5.441932327359051]
Hindsight Experience Replay (HER) is one of the efficient algorithm to solve Reinforcement Learning tasks. But due to its reduced sample efficiency and slower convergence HER fails to perform effectively. Natural gradients solves these challenges by converging the model parameters better. Our proposed method solves the above mentioned challenges with better sample efficiency and faster convergence with increased success rate.
arXiv Detail & Related papers (2020-10-09T20:25:14Z)
Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods. Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z)
An adaptive synchronization approach for weights of deep reinforcement learning [2.132096006921048]
Deep Q-Networks (DQN) is one of the most well-known methods of deep reinforcement learning. synchronizing the network weight in a fixed step size, independent of the agent's behavior, may in some cases cause the loss of some properly learned networks. We propose an adaptive approach for the synchronization of the neural weights used in DQN.
arXiv Detail & Related papers (2020-08-16T18:49:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.