On the Convergence of SARSA with Linear Function Approximation
- URL: http://arxiv.org/abs/2202.06828v3
- Date: Fri, 12 May 2023 21:14:54 GMT
- Title: On the Convergence of SARSA with Linear Function Approximation
- Authors: Shangtong Zhang, Remi Tachet, Romain Laroche
- Abstract summary: SARSA is a classical on-policy control algorithm for reinforcement learning.
We show how fast SARSA converges to a bounded region.
We characterizes the behavior of linear SARSA for a new regime.
- Score: 28.48689596152752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: SARSA, a classical on-policy control algorithm for reinforcement learning, is
known to chatter when combined with linear function approximation: SARSA does
not diverge but oscillates in a bounded region. However, little is known about
how fast SARSA converges to that region and how large the region is. In this
paper, we make progress towards this open problem by showing the convergence
rate of projected SARSA to a bounded region. Importantly, the region is much
smaller than the region that we project into, provided that the magnitude of
the reward is not too large. Existing works regarding the convergence of linear
SARSA to a fixed point all require the Lipschitz constant of SARSA's policy
improvement operator to be sufficiently small; our analysis instead applies to
arbitrary Lipschitz constants and thus characterizes the behavior of linear
SARSA for a new regime.
Related papers
- Convergence of Adam in Deep ReLU Networks via Directional Complexity and Kakeya Bounds [49.1574468325115]
First-order adaptive optimization methods like Adam are the default choices for training modern deep neural networks.<n>We develop a multi-layer refinement framework that progressively tightens bounds on region crossings.<n>We prove that the number of region crossings collapses from exponential to near-linear in the effective dimension.
arXiv Detail & Related papers (2025-05-21T01:34:16Z) - Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability [49.96531901205305]
We propose the emphfirst algorithm with $tildeO(epsilon-1)$ sample complexity under single-policy concentrability for offline contextual bandits.
Our proof leverages the strong convexity of the KL regularization, and the conditional non-negativity of the gap between the true reward and its pessimistic estimator.
We extend our algorithm to contextual dueling bandits and achieve a similar nearly optimal sample complexity.
arXiv Detail & Related papers (2025-02-09T22:14:45Z) - Convergence of SARSA with linear function approximation: The random
horizon case [0.0]
SARSA combined with linear function approximation has been shown to converge for infinite horizon discounted Markov decision problems (MDPs)
We show, similar to earlier results for infinite horizon discounted MDPs, that if the behaviour policy is $varepsilon$-soft and Lipschitz continuous with respect to the weight vector of the linear function approximation, with small enough Lipschitz constant, then the algorithm will converge with probability one when considering a random horizon MDP.
arXiv Detail & Related papers (2023-06-07T15:51:06Z) - Efficiently Computing Local Lipschitz Constants of Neural Networks via
Bound Propagation [79.13041340708395]
Lipschitz constants are connected to many properties of neural networks, such as robustness, fairness, and generalization.
Existing methods for computing Lipschitz constants either produce relatively loose upper bounds or are limited to small networks.
We develop an efficient framework for computing the $ell_infty$ local Lipschitz constant of a neural network by tightly upper bounding the norm of Clarke Jacobian.
arXiv Detail & Related papers (2022-10-13T22:23:22Z) - Active Nearest Neighbor Regression Through Delaunay Refinement [79.93030583257597]
We introduce an algorithm for active function approximation based on nearest neighbor regression.
Our Active Nearest Neighbor Regressor (ANNR) relies on the Voronoi-Delaunay framework from computational geometry to subdivide the space into cells with constant estimated function value.
arXiv Detail & Related papers (2022-06-16T10:24:03Z) - Composite Spatial Monte Carlo Integration Based on Generalized Least
Squares [0.0]
Spatial Monte Carlo integration (SMCI) is a sampling-based approximation.
A new effective method is proposed by combining multiple SMCI estimators.
The results indicate that the proposed method can be effective in the inverse Ising problem (or Boltzmann machine learning)
arXiv Detail & Related papers (2022-04-07T06:35:13Z) - Chordal Sparsity for Lipschitz Constant Estimation of Deep Neural
Networks [77.82638674792292]
Lipschitz constants of neural networks allow for guarantees of robustness in image classification, safety in controller design, and generalizability beyond the training data.
As calculating Lipschitz constants is NP-hard, techniques for estimating Lipschitz constants must navigate the trade-off between scalability and accuracy.
In this work, we significantly push the scalability frontier of a semidefinite programming technique known as LipSDP while achieving zero accuracy loss.
arXiv Detail & Related papers (2022-04-02T11:57:52Z) - Semantic Segmentation by Early Region Proxy [53.594035639400616]
We present a novel and efficient modeling that starts from interpreting the image as a tessellation of learnable regions.
To model region-wise context, we exploit Transformer to encode regions in a sequence-to-sequence manner.
Semantic segmentation is now carried out as per-region prediction on top of the encoded region embeddings.
arXiv Detail & Related papers (2022-03-26T10:48:32Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - First-Order Regret in Reinforcement Learning with Linear Function
Approximation: A Robust Estimation Approach [57.570201404222935]
We show that it is possible to obtain regret scaling as $mathcalO(sqrtV_1star K)$ in reinforcement learning with large state spaces.
We demonstrate that existing techniques based on at least squares estimation are insufficient to obtain this result.
arXiv Detail & Related papers (2021-12-07T00:29:57Z) - A PAC-Bayesian Analysis of Distance-Based Classifiers: Why
Nearest-Neighbour works! [12.317405551932195]
PAC-Bayesian bounds for the generalisation error of the K-nearest-neighbour classifier (K-NN)
We establish a relation between prior measures over the coefficients in the kernel expansion and the induced measure on the weight vectors in kernel space.
arXiv Detail & Related papers (2021-09-28T17:35:57Z) - Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex
Envelopes [40.31139355952393]
We construct a smooth Lyapunov function using the generalized envelope, and show that the iterates of SA have negative drift with respect to that Lyapunov function.
In particular, we use it to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-learning.
We also use it to study TD-learning in the on-policy setting, and recover the existing state-of-the-art results for $Q$-learning.
arXiv Detail & Related papers (2020-02-03T16:42:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.