Related papers: Predictor-Corrector(PC) Temporal Difference(TD) Learning (PCTD)

Predictor-Corrector(PC) Temporal Difference(TD) Learning (PCTD)

URL: http://arxiv.org/abs/2104.09620v1
Date: Thu, 15 Apr 2021 18:54:16 GMT
Title: Predictor-Corrector(PC) Temporal Difference(TD) Learning (PCTD)
Authors: Caleb Bowyer
Abstract summary: Predictor-Corrector Temporal Difference (PCTD) is what I call the translated time Reinforcement(RL) algorithm from the theory of discrete time ODE. I propose a new class of TD learning algorithms. The parameter being approximated has a guaranteed order of magnitude reduction in the Taylor Series error of the solution to the ODE.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Using insight from numerical approximation of ODEs and the problem formulation and solution methodology of TD learning through a Galerkin relaxation, I propose a new class of TD learning algorithms. After applying the improved numerical methods, the parameter being approximated has a guaranteed order of magnitude reduction in the Taylor Series error of the solution to the ODE for the parameter $\theta(t)$ that is used in constructing the linearly parameterized value function. Predictor-Corrector Temporal Difference (PCTD) is what I call the translated discrete time Reinforcement Learning(RL) algorithm from the continuous time ODE using the theory of Stochastic Approximation(SA). Both causal and non-causal implementations of the algorithm are provided, and simulation results are listed for an infinite horizon task to compare the original TD(0) algorithm against both versions of PCTD(0).

Related papers

Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties. We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z)
Backstepping Temporal Difference Learning [3.5823366350053325]
We propose a new convergent algorithm for off-policy TD-learning. Our method relies on the backstepping technique, which is widely used in nonlinear control theory. convergence of the proposed algorithm is experimentally verified in environments where the standard TD-learning is known to be unstable.
arXiv Detail & Related papers (2023-02-20T10:06:49Z)
Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming [53.63469275932989]
We consider online statistical inference of constrained nonlinear optimization problems. We apply the Sequential Quadratic Programming (StoSQP) method to solve these problems.
arXiv Detail & Related papers (2022-05-27T00:34:03Z)
Temporal Difference Learning with Continuous Time and State in the Stochastic Setting [0.0]
We consider the problem of continuous-time policy evaluation. This consists in learning through observations the value function associated with an uncontrolled continuous-time dynamic and a reward function.
arXiv Detail & Related papers (2022-02-16T10:10:53Z)
Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach [1.776746672434207]
We show that policy evaluation is equivalent to maintaining the martingale condition of a process. We present two methods to use the martingale characterization for designing PE algorithms.
arXiv Detail & Related papers (2021-08-15T03:37:17Z)
Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling. Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z)
Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis [27.679514676804057]
We develop a variance reduction scheme for the two time-scale TDC algorithm in the off-policy setting. Experiments demonstrate that the proposed variance-reduced TDC achieves a smaller convergence error than both the conventional TDC and the variance-reduced TD.
arXiv Detail & Related papers (2020-10-26T01:33:05Z)
Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function. We consider a randomized approximation of the projected gradient descent algorithm. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)
Adaptive Temporal Difference Learning with Linear Function Approximation [29.741034258674205]
This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation tasks in reinforcement learning. We develop a provably convergent adaptive projected variant of the TD(0) learning algorithm with linear function approximation. We evaluate the performance of AdaTD(0) and AdaTD($lambda$) on several standard reinforcement learning tasks.
arXiv Detail & Related papers (2020-02-20T02:32:40Z)
Reanalysis of Variance Reduced Temporal Difference Learning [57.150444843282]
A variance reduced TD (VRTD) algorithm was proposed by Korda and La, which applies the variance reduction technique directly to the online TD learning with Markovian samples. We show that VRTD is guaranteed to converge to a neighborhood of the fixed-point solution of TD at a linear convergence rate.
arXiv Detail & Related papers (2020-01-07T05:32:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.