Reanalysis of Variance Reduced Temporal Difference Learning
- URL: http://arxiv.org/abs/2001.01898v2
- Date: Fri, 10 Jan 2020 07:22:15 GMT
- Title: Reanalysis of Variance Reduced Temporal Difference Learning
- Authors: Tengyu Xu, Zhe Wang, Yi Zhou, Yingbin Liang
- Abstract summary: A variance reduced TD (VRTD) algorithm was proposed by Korda and La, which applies the variance reduction technique directly to the online TD learning with Markovian samples.
We show that VRTD is guaranteed to converge to a neighborhood of the fixed-point solution of TD at a linear convergence rate.
- Score: 57.150444843282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal difference (TD) learning is a popular algorithm for policy
evaluation in reinforcement learning, but the vanilla TD can substantially
suffer from the inherent optimization variance. A variance reduced TD (VRTD)
algorithm was proposed by Korda and La (2015), which applies the variance
reduction technique directly to the online TD learning with Markovian samples.
In this work, we first point out the technical errors in the analysis of VRTD
in Korda and La (2015), and then provide a mathematically solid analysis of the
non-asymptotic convergence of VRTD and its variance reduction performance. We
show that VRTD is guaranteed to converge to a neighborhood of the fixed-point
solution of TD at a linear convergence rate. Furthermore, the variance error
(for both i.i.d.\ and Markovian sampling) and the bias error (for Markovian
sampling) of VRTD are significantly reduced by the batch size of variance
reduction in comparison to those of vanilla TD. As a result, the overall
computational complexity of VRTD to attain a given accurate solution
outperforms that of TD under Markov sampling and outperforms that of TD under
i.i.d.\ sampling for a sufficiently small conditional number.
Related papers
- Finite Time Analysis of Temporal Difference Learning for Mean-Variance in a Discounted MDP [1.0923877073891446]
We consider the problem of policy evaluation for variance in a discounted reward Markov decision process.
For this problem, a temporal difference (TD) type learning algorithm with linear function approximation (LFA) exists in the literature.
We derive finite sample bounds that hold (i) in the mean-squared sense; and (ii) with high probability, when tail iterate averaging is employed.
arXiv Detail & Related papers (2024-06-12T05:49:53Z) - TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression [109.69084997173196]
Deepscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood.
Recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation.
We study two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean?
Our results show that not only does TIC accurately learn the covariance, it additionally facilitates an improved convergence of the negative log-likelihood.
arXiv Detail & Related papers (2023-10-29T09:54:03Z) - Stable Target Field for Reduced Variance Score Estimation in Diffusion
Models [5.9115407007859755]
Diffusion models generate samples by reversing a fixed forward diffusion process.
We argue that the source of such variance lies in the handling of intermediate noise-variance scales.
We propose to remedy the problem by incorporating a reference batch which we use to calculate weighted conditional scores as more stable training targets.
arXiv Detail & Related papers (2023-02-01T18:57:01Z) - Closing the gap between SVRG and TD-SVRG with Gradient Splitting [17.071971639540976]
Temporal difference (TD) learning is a policy evaluation in reinforcement learning whose performance can be enhanced by variance reduction methods.
Recent work we utilize a recent interpretation of TD-learning as the splitting of the gradient of an appropriately chosen function, thus simplifying the algorithm and fusing TD with SVRG.
Our main result is a geometric convergence bound with predetermined learning rate of $1/8$, which is identical to the convergence bound available for SVRG in the convex setting.
arXiv Detail & Related papers (2022-11-29T14:21:34Z) - DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over
Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting.
By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem.
We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z) - Importance Sampling Placement in Off-Policy Temporal-Difference Methods [3.04585143845864]
We show how off-policy reinforcement learning algorithms correct the entire TD error instead of just the TD target.
Experiments show this subtle modification results in improved performance.
arXiv Detail & Related papers (2022-03-18T21:54:09Z) - Learning from Noisy Labels via Dynamic Loss Thresholding [69.61904305229446]
We propose a novel method named Dynamic Loss Thresholding (DLT)
During the training process, DLT records the loss value of each sample and calculates dynamic loss thresholds.
Experiments on CIFAR-10/100 and Clothing1M demonstrate substantial improvements over recent state-of-the-art methods.
arXiv Detail & Related papers (2021-04-01T07:59:03Z) - Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence
Analysis [27.679514676804057]
We develop a variance reduction scheme for the two time-scale TDC algorithm in the off-policy setting.
Experiments demonstrate that the proposed variance-reduced TDC achieves a smaller convergence error than both the conventional TDC and the variance-reduced TD.
arXiv Detail & Related papers (2020-10-26T01:33:05Z) - Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning.
We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems.
We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z) - Simple and Effective Prevention of Mode Collapse in Deep One-Class
Classification [93.2334223970488]
We propose two regularizers to prevent hypersphere collapse in deep SVDD.
The first regularizer is based on injecting random noise via the standard cross-entropy loss.
The second regularizer penalizes the minibatch variance when it becomes too small.
arXiv Detail & Related papers (2020-01-24T03:44:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.