Consolidated Adaptive T-soft Update for Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2202.12504v1
- Date: Fri, 25 Feb 2022 05:40:07 GMT
- Title: Consolidated Adaptive T-soft Update for Deep Reinforcement Learning
- Authors: Taisuke Kobayashi
- Abstract summary: T-soft update has been proposed as a noise-robust update rule for the target network.
This study develops adaptive T-soft (AT-soft) update by utilizing the update rule in AdaTerm.
- Score: 8.071506311915396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Demand for deep reinforcement learning (DRL) is gradually increased to enable
robots to perform complex tasks, while DRL is known to be unstable. As a
technique to stabilize its learning, a target network that slowly and
asymptotically matches a main network is widely employed to generate stable
pseudo-supervised signals. Recently, T-soft update has been proposed as a
noise-robust update rule for the target network and has contributed to
improving the DRL performance. However, the noise robustness of T-soft update
is specified by a hyperparameter, which should be tuned for each task, and is
deteriorated by a simplified implementation. This study develops adaptive
T-soft (AT-soft) update by utilizing the update rule in AdaTerm, which has been
developed recently. In addition, the concern that the target network does not
asymptotically match the main network is mitigated by a new consolidation for
bringing the main network back to the target network. This so-called
consolidated AT-soft (CAT-soft) update is verified through numerical
simulations.
Related papers
- Stabilizing RNN Gradients through Pre-training [3.335932527835653]
Theory of learning proposes to prevent the gradient from exponential growth with depth or time, to stabilize and improve training.
We extend known stability theories to encompass a broader family of deep recurrent networks, requiring minimal assumptions on data and parameter distribution.
We propose a new approach to mitigate this issue, that consists on giving a weight of a half to the time and depth contributions to the gradient.
arXiv Detail & Related papers (2023-08-23T11:48:35Z) - Multiplicative update rules for accelerating deep learning training and
increasing robustness [69.90473612073767]
We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules.
We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
arXiv Detail & Related papers (2023-07-14T06:44:43Z) - Quantization-aware Interval Bound Propagation for Training Certifiably
Robust Quantized Neural Networks [58.195261590442406]
We study the problem of training and certifying adversarially robust quantized neural networks (QNNs)
Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization.
We present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs.
arXiv Detail & Related papers (2022-11-29T13:32:38Z) - Learning in Feedback-driven Recurrent Spiking Neural Networks using
full-FORCE Training [4.124948554183487]
We propose a supervised training procedure for RSNNs, where a second network is introduced only during the training.
The proposed training procedure consists of generating targets for both recurrent and readout layers.
We demonstrate the improved performance and noise robustness of the proposed full-FORCE training procedure to model 8 dynamical systems.
arXiv Detail & Related papers (2022-05-26T19:01:19Z) - Learning Fast and Slow for Online Time Series Forecasting [76.50127663309604]
Fast and Slow learning Networks (FSNet) is a holistic framework for online time-series forecasting.
FSNet balances fast adaptation to recent changes and retrieving similar old knowledge.
Our code will be made publicly available.
arXiv Detail & Related papers (2022-02-23T18:23:07Z) - Ensemble-in-One: Learning Ensemble within Random Gated Networks for
Enhanced Adversarial Robustness [18.514706498043214]
Adversarial attacks have rendered high security risks on modern deep learning systems.
We propose ensemble-in-one (EIO) to train an ensemble within one random gated network (RGN)
EIO consistently outperforms previous ensemble training methods with even less computational overhead.
arXiv Detail & Related papers (2021-03-27T03:13:03Z) - t-Soft Update of Target Network for Deep Reinforcement Learning [8.071506311915396]
This paper proposes a new robust update rule of target network for deep reinforcement learning (DRL)
A t-soft update method is derived with reference to the analogy between the exponential moving average and the normal distribution.
In PyBullet robotics simulations for DRL, an online actor-critic algorithm with the t-soft update outperformed the conventional methods in terms of the obtained return and/or its variance.
arXiv Detail & Related papers (2020-08-25T07:41:47Z) - Improve Generalization and Robustness of Neural Networks via Weight
Scale Shifting Invariant Regularizations [52.493315075385325]
We show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with homogeneous activation functions.
We propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network.
arXiv Detail & Related papers (2020-08-07T02:55:28Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning.
We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset.
We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z) - STDPG: A Spatio-Temporal Deterministic Policy Gradient Agent for Dynamic
Routing in SDN [6.27420060051673]
Dynamic routing in software-defined networking (SDN) can be viewed as a centralized decision-making problem.
We propose a novel model-free framework for dynamic routing in SDN, which is referred to as SDN-temporal deterministic policy gradient (STDPG) agent.
STDPG achieves better routing solutions in terms of average end-to-end delay.
arXiv Detail & Related papers (2020-04-21T07:19:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.