Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement
Learning with General Function Approximation
- URL: http://arxiv.org/abs/2312.04464v1
- Date: Thu, 7 Dec 2023 17:35:34 GMT
- Title: Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement
Learning with General Function Approximation
- Authors: Jiayi Huang, Han Zhong, Liwei Wang, Lin F. Yang
- Abstract summary: We propose an algorithm to tackle long planning horizon problems in reinforcement learning with general function approximation.
The derived regret bound is deemed emphsharp as it matches the minimax lower bound when specialized to linear mixture MDPs up to logarithmic factors.
The achievement of such a horizon-free, instance-dependent, and sharp regret bound hinges upon (i) novel algorithm designs and (ii) fine-grained analyses.
- Score: 26.277745106128197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To tackle long planning horizon problems in reinforcement learning with
general function approximation, we propose the first algorithm, termed as
UCRL-WVTR, that achieves both \emph{horizon-free} and
\emph{instance-dependent}, since it eliminates the polynomial dependency on the
planning horizon. The derived regret bound is deemed \emph{sharp}, as it
matches the minimax lower bound when specialized to linear mixture MDPs up to
logarithmic factors. Furthermore, UCRL-WVTR is \emph{computationally efficient}
with access to a regression oracle. The achievement of such a horizon-free,
instance-dependent, and sharp regret bound hinges upon (i) novel algorithm
designs: weighted value-targeted regression and a high-order moment estimator
in the context of general function approximation; and (ii) fine-grained
analyses: a novel concentration bound of weighted non-linear least squares and
a refined analysis which leads to the tight instance-dependent bound. We also
conduct comprehensive experiments to corroborate our theoretical findings.
Related papers
- Riemannian Federated Learning via Averaging Gradient Stream [8.75592575216789]
This paper develops and analyzes an efficient Federated Averaging Gradient Stream (RFedAGS) algorithm.
Numerical simulations conducted on synthetic and real-world data demonstrate the performance of the proposed RFedAGS.
arXiv Detail & Related papers (2024-09-11T12:28:42Z) - Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement Learning [0.0]
Successive Over-Relaxation (SOR) Q-learning, which introduces a relaxation factor to speed up convergence, has two major limitations.
We propose a sample-based, model-free double SOR Q-learning algorithm.
The proposed algorithm is extended to large-scale problems using deep RL.
arXiv Detail & Related papers (2024-09-10T09:23:03Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Neural Network Approximation for Pessimistic Offline Reinforcement
Learning [17.756108291816908]
We present a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation.
Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight.
arXiv Detail & Related papers (2023-12-19T05:17:27Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning [53.97335841137496]
We propose an oracle-efficient algorithm, dubbed Pessimistic Least-Square Value Iteration (PNLSVI) for offline RL with non-linear function approximation.
Our algorithm enjoys a regret bound that has a tight dependency on the function class complexity and achieves minimax optimal instance-dependent regret when specialized to linear function approximation.
arXiv Detail & Related papers (2023-10-02T17:42:01Z) - Fine-grained analysis of non-parametric estimation for pairwise learning [9.676007573960383]
We are concerned with the generalization performance of non-parametric estimation for pairwise learning.
Our results can be used to handle a wide range of pairwise learning problems including ranking, AUC, pairwise regression and metric and similarity learning.
arXiv Detail & Related papers (2023-05-31T08:13:14Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems.
"vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z) - A spectral algorithm for robust regression with subgaussian rates [0.0]
We study a new linear up to quadratic time algorithm for linear regression in the absence of strong assumptions on the underlying distributions of samples.
The goal is to design a procedure which attains the optimal sub-gaussian error bound even though the data have only finite moments.
arXiv Detail & Related papers (2020-07-12T19:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.