An adaptive synchronization approach for weights of deep reinforcement
learning
- URL: http://arxiv.org/abs/2008.06973v1
- Date: Sun, 16 Aug 2020 18:49:35 GMT
- Title: An adaptive synchronization approach for weights of deep reinforcement
learning
- Authors: S. Amirreza Badran, Mansoor Rezghi
- Abstract summary: Deep Q-Networks (DQN) is one of the most well-known methods of deep reinforcement learning.
synchronizing the network weight in a fixed step size, independent of the agent's behavior, may in some cases cause the loss of some properly learned networks.
We propose an adaptive approach for the synchronization of the neural weights used in DQN.
- Score: 2.132096006921048
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Q-Networks (DQN) is one of the most well-known methods of deep
reinforcement learning, which uses deep learning to approximate the
action-value function. Solving numerous Deep reinforcement learning challenges
such as moving targets problem and the correlation between samples are the main
advantages of this model. Although there have been various extensions of DQN in
recent years, they all use a similar method to DQN to overcome the problem of
moving targets. Despite the advantages mentioned, synchronizing the network
weight in a fixed step size, independent of the agent's behavior, may in some
cases cause the loss of some properly learned networks. These lost networks may
lead to states with more rewards, hence better samples stored in the replay
memory for future training. In this paper, we address this problem from the DQN
family and provide an adaptive approach for the synchronization of the neural
weights used in DQN. In this method, the synchronization of weights is done
based on the recent behavior of the agent, which is measured by a criterion at
the end of the intervals. To test this method, we adjusted the DQN and rainbow
methods with the proposed adaptive synchronization method. We compared these
adjusted methods with their standard form on well-known games, which results
confirm the quality of our synchronization methods.
Related papers
- FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental
Regularization [5.182014186927254]
Federated Learning (FL) has been successfully adopted for distributed training and inference of large-scale Deep Neural Networks (DNNs)
We contribute with a novel FL framework (coined FedDIP) which combines (i) dynamic model pruning with error feedback to eliminate redundant information exchange.
We provide convergence analysis of FedDIP and report on a comprehensive performance and comparative assessment against state-of-the-art methods.
arXiv Detail & Related papers (2023-09-13T08:51:19Z) - Contrastive Example-Based Control [163.6482792040079]
We propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function.
Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions.
arXiv Detail & Related papers (2023-07-24T19:43:22Z) - OSP: Boosting Distributed Model Training with 2-stage Synchronization [24.702780532364056]
We propose a new model synchronization method named Overlapped Parallelization (OSP)
OSP achieves efficient communication with a 2-stage synchronization approach and uses Local-Gradient-based.
correction (LGP) to avoid accuracy loss caused by stale parameters.
Results show that OSP can achieve up to 50% improvement in throughput without accuracy loss compared to popular synchronization models.
arXiv Detail & Related papers (2023-06-29T13:24:12Z) - M$^2$DQN: A Robust Method for Accelerating Deep Q-learning Network [6.689964384669018]
We propose a framework which uses the Max-Mean loss in Deep Q-Network (M$2$DQN)
Instead of sampling one batch of experiences in the training step, we sample several batches from the experience replay and update the parameters such as the maximum TD-error of these batches is minimized.
We verify the effectiveness of this framework with one of the most widely used techniques, Double DQN (DDQN) in several gym games.
arXiv Detail & Related papers (2022-09-16T09:20:35Z) - TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent
Kernels [141.29156234353133]
State-of-the-art convex learning methods can perform far worse than their centralized counterparts when clients have dissimilar data distributions.
We show this disparity can largely be attributed to challenges presented by non-NISTity.
We propose a Train-Convexify neural network (TCT) procedure to sidestep this issue.
arXiv Detail & Related papers (2022-07-13T16:58:22Z) - Analytically Tractable Bayesian Deep Q-Learning [0.0]
We adapt the temporal difference Q-learning framework to make it compatible with the tractable approximate Gaussian inference (TAGI)
We demonstrate that TAGI can reach a performance comparable to backpropagation-trained networks.
arXiv Detail & Related papers (2021-06-21T13:11:52Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z) - DisCor: Corrective Feedback in Reinforcement Learning via Distribution
Correction [96.90215318875859]
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from corrective feedback.
We propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training.
arXiv Detail & Related papers (2020-03-16T16:18:52Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z) - Improving Robustness of Deep-Learning-Based Image Reconstruction [24.882806652224854]
We show that for inverse problem solvers, one should analyze and study the effect of adversaries in the measurement-space.
We introduce an auxiliary network to generate adversarial examples, which is used in a min-max formulation to build robust image reconstruction networks.
We find that a linear network using the proposed min-max learning scheme indeed converges to the same solution.
arXiv Detail & Related papers (2020-02-26T22:12:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.