On Convergence of Gradient Expected Sarsa($\lambda$)
- URL: http://arxiv.org/abs/2012.07199v1
- Date: Mon, 14 Dec 2020 01:27:24 GMT
- Title: On Convergence of Gradient Expected Sarsa($\lambda$)
- Authors: Long Yang, Gang Zheng, Yu Zhang, Qian Zheng, Pengfei Li, Gang Pan
- Abstract summary: We show that applying the off-line estimate (multi-step bootstrapping) to $mathttExpectedSarsa(lambda)$ is unstable for off-policy learning.
We propose a convergent $mathttGradientExpectedSarsa(lambda)$ ($mathttGES(lambda)$) algorithm.
- Score: 25.983112169543375
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the convergence of $\mathtt{Expected~Sarsa}(\lambda)$ with linear
function approximation. We show that applying the off-line estimate (multi-step
bootstrapping) to $\mathtt{Expected~Sarsa}(\lambda)$ is unstable for off-policy
learning. Furthermore, based on convex-concave saddle-point framework, we
propose a convergent $\mathtt{Gradient~Expected~Sarsa}(\lambda)$
($\mathtt{GES}(\lambda)$) algorithm. The theoretical analysis shows that our
$\mathtt{GES}(\lambda)$ converges to the optimal solution at a linear
convergence rate, which is comparable to extensive existing state-of-the-art
gradient temporal difference learning algorithms. Furthermore, we develop a
Lyapunov function technique to investigate how the step-size influences
finite-time performance of $\mathtt{GES}(\lambda)$, such technique of Lyapunov
function can be potentially generalized to other GTD algorithms. Finally, we
conduct experiments to verify the effectiveness of our $\mathtt{GES}(\lambda)$.
Related papers
- Iterative thresholding for non-linear learning in the strong $\varepsilon$-contamination model [3.309767076331365]
We derive approximation bounds for learning single neuron models using thresholded descent.
We also study the linear regression problem, where $sigma(mathbfx) = mathbfx$.
arXiv Detail & Related papers (2024-09-05T16:59:56Z) - Optimal and Efficient Algorithms for Decentralized Online Convex Optimization [51.00357162913229]
Decentralized online convex optimization (D-OCO) is designed to minimize a sequence of global loss functions using only local computations and communications.
We develop a novel D-OCO algorithm that can reduce the regret bounds for convex and strongly convex functions to $tildeO(nrho-1/4sqrtT)$ and $tildeO(nrho-1/2log T)$.
Our analysis reveals that the projection-free variant can achieve $O(nT3/4)$ and $O(n
arXiv Detail & Related papers (2024-02-14T13:44:16Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Convergence of a Normal Map-based Prox-SGD Method under the KL
Inequality [0.0]
We present a novel map-based algorithm ($mathsfnorMtext-mathsfSGD$) for $symbol$k$ convergence problems.
arXiv Detail & Related papers (2023-05-10T01:12:11Z) - High Probability Convergence of Stochastic Gradient Methods [15.829413808059124]
We show convergence with bounds depending on the initial distance to the optimal solution.
We demonstrate that our techniques can be used to obtain high bound for AdaGrad-Norm.
arXiv Detail & Related papers (2023-02-28T18:42:11Z) - Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD [38.221784575853796]
This work considers the problem of finding first-order stationary point of a non atau function with potentially constant smoothness using a gradient.
We develop a technique that allows us to prove $mathcalO(fracmathrmpolylog(T)sigmatT)$ convergence rates without assuming uniform bounds on the noise.
arXiv Detail & Related papers (2023-02-13T18:13:36Z) - Randomized Block-Coordinate Optimistic Gradient Algorithms for Root-Finding Problems [11.15373699918747]
We develop two new algorithms to approximate a solution of nonlinear equations in large-scale settings.
We apply our methods to a class of large-scale finite-sum inclusions, which covers prominent applications in machine learning, statistical learning, and network optimization.
arXiv Detail & Related papers (2023-01-08T21:46:27Z) - ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.