Related papers: Correcting Momentum in Temporal Difference Learning

Correcting Momentum in Temporal Difference Learning

URL: http://arxiv.org/abs/2106.03955v1
Date: Mon, 7 Jun 2021 20:41:15 GMT
Title: Correcting Momentum in Temporal Difference Learning
Authors: Emmanuel Bengio, Joelle Pineau, Doina Precup
Abstract summary: We argue that momentum in Temporal Difference (TD) learning accumulates gradients that become doubly stale. We show that this phenomenon exists, and then propose a first-order correction term to momentum. An important insight of this work is that deep RL methods are not always best served by directly importing techniques from the supervised setting.
Score: 95.62766731469671
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A common optimization tool used in deep reinforcement learning is momentum, which consists in accumulating and discounting past gradients, reapplying them at each iteration. We argue that, unlike in supervised learning, momentum in Temporal Difference (TD) learning accumulates gradients that become doubly stale: not only does the gradient of the loss change due to parameter updates, the loss itself changes due to bootstrapping. We first show that this phenomenon exists, and then propose a first-order correction term to momentum. We show that this correction term improves sample efficiency in policy evaluation by correcting target value drift. An important insight of this work is that deep RL methods are not always best served by directly importing techniques from the supervised setting.

Related papers

Instance-dependent Early Stopping [57.912273923450726]
We propose an Instance-dependent Early Stopping (IES) method that adapts the early stopping mechanism from the entire training set to the instance level. IES considers an instance as mastered if the second-order differences of its loss value remain within a small range around zero. IES can reduce backpropagation instances by 10%-50% while maintaining or even slightly improving the test accuracy and transfer learning performance of a model.
arXiv Detail & Related papers (2025-02-11T13:34:09Z)
ABPT: Amended Backpropagation through Time with Partially Differentiable Rewards [3.1986315488647588]
Partially differentiable rewards will result in biased gradient propagation that degrades training performance. We propose Amended Backpropagation-through-Time (ABPT), a novel approach that mitigates gradient bias while preserving the training efficiency of BPTT. ABPT combines 0-step and N-step returns, effectively reducing the bias by leveraging value gradients from the learned Q-value function.
arXiv Detail & Related papers (2025-01-24T14:18:22Z)
An Effective Dynamic Gradient Calibration Method for Continual Learning [11.555822066922508]
Continual learning (CL) is a fundamental topic in machine learning, where the goal is to train a model with continuously incoming data and tasks. Due to the memory limit, we cannot store all the historical data, and therefore confront the catastrophic forgetting'' problem. We develop an effective algorithm to calibrate the gradient in each updating step of the model.
arXiv Detail & Related papers (2024-07-30T16:30:09Z)
Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature. We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate. We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z)
ODICE: Revealing the Mystery of Distribution Correction Estimation via Orthogonal-gradient Update [43.91666113724066]
We investigate the DIstribution Correction Estimation (DICE) methods, an important line of work in offline reinforcement learning (RL) and imitation learning (IL) DICE-based methods impose state-action-level behavior constraint, which is an ideal choice for offline learning. We find there exist two gradient terms when learning the value function using true-gradient update: forward gradient (taken on the current state) and backward gradient (taken on the next state)
arXiv Detail & Related papers (2024-02-01T05:30:51Z)
DPSUR: Accelerating Differentially Private Stochastic Gradient Descent Using Selective Update and Release [29.765896801370612]
This paper proposes Differentially Private training framework based on Selective Updates and Release. The main challenges lie in two aspects -- privacy concerns, and gradient selection strategy for model update. Experiments conducted on MNIST, FMNIST, CIFAR-10, and IMDB datasets show that DPSUR significantly outperforms previous works in terms of convergence speed.
arXiv Detail & Related papers (2023-11-23T15:19:30Z)
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime. We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z)
Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning [97.28695683236981]
More gradient updates decrease the expressivity of the current value network. We demonstrate this phenomenon on Atari and Gym benchmarks, in both offline and online RL settings.
arXiv Detail & Related papers (2020-10-27T17:55:16Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
The Break-Even Point on Optimization Trajectories of Deep Neural Networks [64.7563588124004]
We argue for the existence of the "break-even" point on this trajectory. We show that using a large learning rate in the initial phase of training reduces the variance of the gradient. We also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers.
arXiv Detail & Related papers (2020-02-21T22:55:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.