Gradient Monitored Reinforcement Learning
- URL: http://arxiv.org/abs/2005.12108v1
- Date: Mon, 25 May 2020 13:45:47 GMT
- Title: Gradient Monitored Reinforcement Learning
- Authors: Mohammed Sharafath Abdul Hameed (1), Gavneet Singh Chadha (1), Andreas
Schwung (1), and Steven X. Ding (2) ((1) South Westphalia University of
Applied Sciences, Germany (2) University of Duisburg-Essen, Germany)
- Abstract summary: We focus on the enhancement of training and evaluation performance in reinforcement learning algorithms.
We propose an approach to steer the learning in the weight parameters of a neural network based on the dynamic development and feedback from the training process itself.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents a novel neural network training approach for faster
convergence and better generalization abilities in deep reinforcement learning.
Particularly, we focus on the enhancement of training and evaluation
performance in reinforcement learning algorithms by systematically reducing
gradient's variance and thereby providing a more targeted learning process. The
proposed method which we term as Gradient Monitoring(GM), is an approach to
steer the learning in the weight parameters of a neural network based on the
dynamic development and feedback from the training process itself. We propose
different variants of the GM methodology which have been proven to increase the
underlying performance of the model. The one of the proposed variant, Momentum
with Gradient Monitoring (M-WGM), allows for a continuous adjustment of the
quantum of back-propagated gradients in the network based on certain learning
parameters. We further enhance the method with Adaptive Momentum with Gradient
Monitoring (AM-WGM) method which allows for automatic adjustment between
focused learning of certain weights versus a more dispersed learning depending
on the feedback from the rewards collected. As a by-product, it also allows for
automatic derivation of the required deep network sizes during training as the
algorithm automatically freezes trained weights. The approach is applied to two
discrete (Multi-Robot Co-ordination problem and Atari games) and one continuous
control task (MuJoCo) using Advantage Actor-Critic (A2C) and Proximal Policy
Optimization (PPO) respectively. The results obtained particularly underline
the applicability and performance improvements of the methods in terms of
generalization capability.
Related papers
- Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives.
We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z) - On discretisation drift and smoothness regularisation in neural network
training [0.0]
We aim to make steps towards an improved understanding of deep learning with a focus on optimisation and model regularisation.
We start by investigating gradient descent (GD), a discrete-time algorithm at the basis of most popular deep learning optimisation algorithms.
We derive novel continuous-time flows that account for discretisation drift. Unlike the NGF, these new flows can be used to describe learning rate specific behaviours of GD, such as training instabilities observed in supervised learning and two-player games.
We then translate insights from continuous time into mitigation strategies for unstable GD dynamics, by constructing novel learning rate schedules and regulariser
arXiv Detail & Related papers (2023-10-21T15:21:36Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training.
We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z) - Hyper-Learning for Gradient-Based Batch Size Adaptation [2.944323057176686]
Scheduling the batch size to increase is an effective strategy to control noise when training deep neural networks.
We introduce Arbiter as a new hyper-optimization algorithm to perform batch size adaptations for learnable schedulings.
We demonstrate Arbiter's effectiveness in several illustrative experiments.
arXiv Detail & Related papers (2022-05-17T11:01:14Z) - Efficient Differentiable Simulation of Articulated Bodies [89.64118042429287]
We present a method for efficient differentiable simulation of articulated bodies.
This enables integration of articulated body dynamics into deep learning frameworks.
We show that reinforcement learning with articulated systems can be accelerated using gradients provided by our method.
arXiv Detail & Related papers (2021-09-16T04:48:13Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z) - META-Learning Eligibility Traces for More Sample Efficient Temporal
Difference Learning [2.0559497209595823]
We propose a meta-learning method for adjusting the eligibility trace parameter, in a state-dependent manner.
The adaptation is achieved with the help of auxiliary learners that learn distributional information about the update targets online.
We prove that, under some assumptions, the proposed method improves the overall quality of the update targets, by minimizing the overall target error.
arXiv Detail & Related papers (2020-06-16T03:41:07Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Weighted Aggregating Stochastic Gradient Descent for Parallel Deep
Learning [8.366415386275557]
Solution involves a reformation of the objective function for optimization in neural network models.
We introduce a decentralized weighted aggregating scheme based on the performance of local workers.
To validate the new method, we benchmark our schemes against several popular algorithms.
arXiv Detail & Related papers (2020-04-07T23:38:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.