Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex
Envelopes
- URL: http://arxiv.org/abs/2002.00874v6
- Date: Wed, 30 Jun 2021 13:09:14 GMT
- Title: Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex
Envelopes
- Authors: Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, and Karthikeyan
Shanmugam
- Abstract summary: We construct a smooth Lyapunov function using the generalized envelope, and show that the iterates of SA have negative drift with respect to that Lyapunov function.
In particular, we use it to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-learning.
We also use it to study TD-learning in the on-policy setting, and recover the existing state-of-the-art results for $Q$-learning.
- Score: 40.31139355952393
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stochastic Approximation (SA) is a popular approach for solving fixed-point
equations where the information is corrupted by noise. In this paper, we
consider an SA involving a contraction mapping with respect to an arbitrary
norm, and show its finite-sample error bounds while using different stepsizes.
The idea is to construct a smooth Lyapunov function using the generalized
Moreau envelope, and show that the iterates of SA have negative drift with
respect to that Lyapunov function. Our result is applicable in Reinforcement
Learning (RL). In particular, we use it to establish the first-known
convergence rate of the V-trace algorithm for off-policy TD-learning. Moreover,
we also use it to study TD-learning in the on-policy setting, and recover the
existing state-of-the-art results for $Q$-learning. Importantly, our
construction results in only a logarithmic dependence of the convergence bound
on the size of the state-space.
Related papers
- Stochastic Approximation with Unbounded Markovian Noise: A General-Purpose Theorem [7.443139252028032]
We consider average-reward Reinforcement Learning with unbounded state space and reward function.
Recent works studied this problem in the actor-critic framework.
We study Temporal Difference (TD) learning with linear function approximation.
arXiv Detail & Related papers (2024-10-29T03:40:53Z) - Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose.
In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Riemannian Federated Learning via Averaging Gradient Stream [8.75592575216789]
This paper develops and analyzes an efficient Federated Averaging Gradient Stream (RFedAGS) algorithm.
Numerical simulations conducted on synthetic and real-world data demonstrate the performance of the proposed RFedAGS.
arXiv Detail & Related papers (2024-09-11T12:28:42Z) - Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning [15.041074872715752]
We prove the validity of the confidence intervals for parameter estimation with LSA based on multiplier bootstrap.
We illustrate our findings in the setting of temporal difference learning with linear function approximation.
arXiv Detail & Related papers (2024-05-26T17:43:30Z) - Taming Score-Based Diffusion Priors for Infinite-Dimensional Nonlinear Inverse Problems [4.42498215122234]
This work introduces a sampling method capable of solving Bayesian inverse problems in function space.
It does not assume the log-concavity of the likelihood, meaning that it is compatible with nonlinear inverse problems.
A novel convergence analysis is conducted, inspired by the fixed-point methods established for traditional regularization-by-denoising algorithms.
arXiv Detail & Related papers (2024-05-24T16:17:01Z) - Imitation Learning in Discounted Linear MDPs without exploration assumptions [58.81226849657474]
We present a new algorithm for imitation learning in infinite horizon linear MDPs dubbed ILARL.
We improve the dependence on the desired accuracy $epsilon$ from $mathcalO(epsilon-5)$ to $mathcalO(epsilon-4)$.
Numerical experiments with linear function approximation shows that ILARL outperforms other commonly used algorithms.
arXiv Detail & Related papers (2024-05-03T15:28:44Z) - Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces.
We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.