Related papers: t-Soft Update of Target Network for Deep Reinforcement Learning

t-Soft Update of Target Network for Deep Reinforcement Learning

URL: http://arxiv.org/abs/2008.10861v2
Date: Fri, 25 Dec 2020 01:56:12 GMT
Title: t-Soft Update of Target Network for Deep Reinforcement Learning
Authors: Taisuke Kobayashi and Wendyam Eric Lionel Ilboudo
Abstract summary: This paper proposes a new robust update rule of target network for deep reinforcement learning (DRL) A t-soft update method is derived with reference to the analogy between the exponential moving average and the normal distribution. In PyBullet robotics simulations for DRL, an online actor-critic algorithm with the t-soft update outperformed the conventional methods in terms of the obtained return and/or its variance.
Score: 8.071506311915396
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes a new robust update rule of target network for deep reinforcement learning (DRL), to replace the conventional update rule, given as an exponential moving average. The target network is for smoothly generating the reference signals for a main network in DRL, thereby reducing learning variance. The problem with its conventional update rule is the fact that all the parameters are smoothly copied with the same speed from the main network, even when some of them are trying to update toward the wrong directions. This behavior increases the risk of generating the wrong reference signals. Although slowing down the overall update speed is a naive way to mitigate wrong updates, it would decrease learning speed. To robustly update the parameters while keeping learning speed, a t-soft update method, which is inspired by student-t distribution, is derived with reference to the analogy between the exponential moving average and the normal distribution. Through the analysis of the derived t-soft update, we show that it takes over the properties of the student-t distribution. Specifically, with a heavy-tailed property of the student-t distribution, the t-soft update automatically excludes extreme updates that differ from past experiences. In addition, when the updates are similar to the past experiences, it can mitigate the learning delay by increasing the amount of updates. In PyBullet robotics simulations for DRL, an online actor-critic algorithm with the t-soft update outperformed the conventional methods in terms of the obtained return and/or its variance. From the training process by the t-soft update, we found that the t-soft update is globally consistent with the standard soft update, and the update rates are locally adjusted for acceleration or suppression.

Related papers

Exact, Tractable Gauss-Newton Optimization in Deep Reversible Architectures Reveal Poor Generalization [52.16435732772263]
Second-order optimization has been shown to accelerate the training of deep neural networks in many applications. However, generalization properties of second-order methods are still being debated. We show for the first time that exact Gauss-Newton (GN) updates take on a tractable form in a class of deep architectures.
arXiv Detail & Related papers (2024-11-12T17:58:40Z)
Implicit Interpretation of Importance Weight Aware Updates [15.974402990630402]
Subgradient descent is one of the most used optimization algorithms in convex machine learning algorithms. We show for the first time that IWA updates have a strictly better regret upper bound than plain gradient updates.
arXiv Detail & Related papers (2023-07-22T01:37:52Z)
Multiplicative update rules for accelerating deep learning training and increasing robustness [69.90473612073767]
We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
arXiv Detail & Related papers (2023-07-14T06:44:43Z)
InRank: Incremental Low-Rank Learning [85.6380047359139]
gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training. Existing training algorithms do not exploit the low-rank property to improve computational efficiency. We design a new training algorithm Incremental Low-Rank Learning (InRank), which explicitly expresses cumulative weight updates as low-rank matrices.
arXiv Detail & Related papers (2023-06-20T03:03:04Z)
Adaptive Differential Filters for Fast and Communication-Efficient Federated Learning [12.067586493399308]
Federated learning (FL) scenarios generate a large communication overhead by frequently transmitting neural network updates between clients and server. We propose a new scaling method operating at the granularity of convolutional filters which compensates for sparse updates in FL processes. The proposed method improves the performance of the central server model while converging faster and reducing the total amount of transmitted data by up to 377 times.
arXiv Detail & Related papers (2022-04-09T08:23:25Z)
Global Update Guided Federated Learning [11.731231528534035]
Federated learning protects data privacy and security by exchanging models instead of data. We propose global-update-guided federated learning (FedGG), which introduces a model-cosine loss into local objective functions. Numerical simulations show that FedGG has a significant improvement on model convergence accuracies and speeds.
arXiv Detail & Related papers (2022-04-08T08:36:26Z)
Consolidated Adaptive T-soft Update for Deep Reinforcement Learning [8.071506311915396]
T-soft update has been proposed as a noise-robust update rule for the target network. This study develops adaptive T-soft (AT-soft) update by utilizing the update rule in AdaTerm.
arXiv Detail & Related papers (2022-02-25T05:40:07Z)
Distribution Mismatch Correction for Improved Robustness in Deep Neural Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions. We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer. In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z)
Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem) AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient. Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)
RIFLE: Backpropagation in Depth for Deep Transfer Learning through Re-Initializing the Fully-connected LayEr [60.07531696857743]
Fine-tuning the deep convolution neural network(CNN) using a pre-trained model helps transfer knowledge learned from larger datasets to the target task. We propose RIFLE - a strategy that deepens backpropagation in transfer learning settings. RIFLE brings meaningful updates to the weights of deep CNN layers and improves low-level feature learning.
arXiv Detail & Related papers (2020-07-07T11:27:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.