Variance Reduction for Deep Q-Learning using Stochastic Recursive
Gradient
- URL: http://arxiv.org/abs/2007.12817v1
- Date: Sat, 25 Jul 2020 00:54:20 GMT
- Title: Variance Reduction for Deep Q-Learning using Stochastic Recursive
Gradient
- Authors: Haonan Jia, Xiao Zhang, Jun Xu, Wei Zeng, Hao Jiang, Xiaohui Yan,
Ji-Rong Wen
- Abstract summary: Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance.
This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
- Score: 51.880464915253924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Q-learning algorithms often suffer from poor gradient estimations with
an excessive variance, resulting in unstable training and poor sampling
efficiency. Stochastic variance-reduced gradient methods such as SVRG have been
applied to reduce the estimation variance (Zhao et al. 2019). However, due to
the online instance generation nature of reinforcement learning, directly
applying SVRG to deep Q-learning is facing the problem of the inaccurate
estimation of the anchor points, which dramatically limits the potentials of
SVRG. To address this issue and inspired by the recursive gradient variance
reduction algorithm SARAH (Nguyen et al. 2017), this paper proposes to
introduce the recursive framework for updating the stochastic gradient
estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
Unlike the SVRG-based algorithms, SRG-DQN designs a recursive update of the
stochastic gradient estimate. The parameter update is along an accumulated
direction using the past stochastic gradient information, and therefore can get
rid of the estimation of the full gradients as the anchors. Additionally,
SRG-DQN involves the Adam process for further accelerating the training
process. Theoretical analysis and the experimental results on well-known
reinforcement learning tasks demonstrate the efficiency and effectiveness of
the proposed SRG-DQN algorithm.
Related papers
- Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Random-reshuffled SARAH does not need a full gradient computations [61.85897464405715]
The StochAstic Recursive grAdientritHm (SARAH) algorithm is a variance reduced variant of the Gradient Descent (SGD) algorithm.
In this paper, we remove the necessity of a full gradient.
The aggregated gradients serve as an estimate of a full gradient in the SARAH algorithm.
arXiv Detail & Related papers (2021-11-26T06:00:44Z) - Low-memory stochastic backpropagation with multi-channel randomized
trace estimation [6.985273194899884]
We propose to approximate the gradient of convolutional layers in neural networks with a multi-channel randomized trace estimation technique.
Compared to other methods, this approach is simple, amenable to analyses, and leads to a greatly reduced memory footprint.
We discuss the performance of networks trained with backpropagation and how the error can be controlled while maximizing memory usage and minimizing computational overhead.
arXiv Detail & Related papers (2021-06-13T13:54:02Z) - A Differentiable Point Process with Its Application to Spiking Neural
Networks [13.160616423673373]
Jimenez Rezende & Gerstner (2014) proposed a variational inference algorithm to train SNNs with hidden neurons.
This paper presents an alternative gradient estimator for SNNs based on the path-wise gradient estimator.
arXiv Detail & Related papers (2021-06-02T02:40:17Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z) - Semi-Implicit Back Propagation [1.5533842336139065]
We propose a semi-implicit back propagation method for neural network training.
The difference on the neurons are propagated in a backward fashion and the parameters are updated with proximal mapping.
Experiments on both MNIST and CIFAR-10 demonstrate that the proposed algorithm leads to better performance in terms of both loss decreasing and training/validation accuracy.
arXiv Detail & Related papers (2020-02-10T03:26:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.