Related papers: Reinforcement Learning with Function Approximation: From Linear to Nonlinear

Reinforcement Learning with Function Approximation: From Linear to Nonlinear

URL: http://arxiv.org/abs/2302.09703v2
Date: Fri, 19 May 2023 01:01:39 GMT
Title: Reinforcement Learning with Function Approximation: From Linear to Nonlinear
Authors: Jihao Long and Jiequn Han
Abstract summary: This paper reviews recent results on error analysis for reinforcement learning algorithms in linear or nonlinear approximation settings. We discuss various properties related to approximation error and present concrete conditions on transition probability and reward function.
Score: 4.314956204483073
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Function approximation has been an indispensable component in modern reinforcement learning algorithms designed to tackle problems with large state spaces in high dimensions. This paper reviews recent results on error analysis for these reinforcement learning algorithms in linear or nonlinear approximation settings, emphasizing approximation error and estimation error/sample complexity. We discuss various properties related to approximation error and present concrete conditions on transition probability and reward function under which these properties hold true. Sample complexity analysis in reinforcement learning is more complicated than in supervised learning, primarily due to the distribution mismatch phenomenon. With assumptions on the linear structure of the problem, numerous algorithms in the literature achieve polynomial sample complexity with respect to the number of features, episode length, and accuracy, although the minimax rate has not been achieved yet. These results rely on the $L^\infty$ and UCB estimation of estimation error, which can handle the distribution mismatch phenomenon. The problem and analysis become substantially more challenging in the setting of nonlinear function approximation, as both $L^\infty$ and UCB estimation are inadequate for bounding the error with a favorable rate in high dimensions. We discuss additional assumptions necessary to address the distribution mismatch and derive meaningful results for nonlinear RL problems.

Related papers

Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems. We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z)
Inexact subgradient methods for semialgebraic functions [18.293072574300798]
Motivated by the widespread use of approximate derivatives in machine learning and machine learning optimization, we inexact subient methods with non-vanishing errors.
arXiv Detail & Related papers (2024-04-30T12:47:42Z)
Neural Network Approximation for Pessimistic Offline Reinforcement Learning [17.756108291816908]
We present a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation. Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight.
arXiv Detail & Related papers (2023-12-19T05:17:27Z)
Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning [53.97335841137496]
We propose an oracle-efficient algorithm, dubbed Pessimistic Least-Square Value Iteration (PNLSVI) for offline RL with non-linear function approximation. Our algorithm enjoys a regret bound that has a tight dependency on the function class complexity and achieves minimax optimal instance-dependent regret when specialized to linear function approximation.
arXiv Detail & Related papers (2023-10-02T17:42:01Z)
Online Regularized Learning Algorithm for Functional Data [2.5382095320488673]
This paper considers online regularized learning algorithm in Hilbert kernel spaces. It shows that convergence rates of both prediction error and estimation error with constant step-size are competitive with those in the literature.
arXiv Detail & Related papers (2022-11-24T11:56:10Z)
Asymptotically Unbiased Instance-wise Regularized Partial AUC Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier. Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable. We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z)
Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling [28.371541697552928]
We present the first result for non-linear function approximation which holds for general action spaces under a linear embeddability condition. We show worst case sample complexity guarantees that scale with a rank parameter of the RL problem.
arXiv Detail & Related papers (2022-03-15T20:50:26Z)
Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions. Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure. In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z)
Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z)
Learning Fast Approximations of Sparse Nonlinear Regression [50.00693981886832]
In this work, we bridge the gap by introducing the Threshold Learned Iterative Shrinkage Algorithming (NLISTA) Experiments on synthetic data corroborate our theoretical results and show our method outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-26T11:31:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.