Linear Bellman Completeness Suffices for Efficient Online Reinforcement Learning with Few Actions
- URL: http://arxiv.org/abs/2406.11640v2
- Date: Tue, 18 Jun 2024 04:27:49 GMT
- Title: Linear Bellman Completeness Suffices for Efficient Online Reinforcement Learning with Few Actions
- Authors: Noah Golowich, Ankur Moitra,
- Abstract summary: It is assumed that Bellman holds, which ensures that these regression problems are well-specified.
We give the first particular algorithm for RL under linear Bellman when the number actions is any constant.
- Score: 29.69428894587431
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the most natural approaches to reinforcement learning (RL) with function approximation is value iteration, which inductively generates approximations to the optimal value function by solving a sequence of regression problems. To ensure the success of value iteration, it is typically assumed that Bellman completeness holds, which ensures that these regression problems are well-specified. We study the problem of learning an optimal policy under Bellman completeness in the online model of RL with linear function approximation. In the linear setting, while statistically efficient algorithms are known under Bellman completeness (e.g., Jiang et al. (2017); Zanette et al. (2020)), these algorithms all rely on the principle of global optimism which requires solving a nonconvex optimization problem. In particular, it has remained open as to whether computationally efficient algorithms exist. In this paper we give the first polynomial-time algorithm for RL under linear Bellman completeness when the number of actions is any constant.
Related papers
- Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics [39.07258580928359]
We study computationally and statistically efficient Reinforcement Learning algorithms for the linear Bellman Complete setting.
This setting uses linear function approximation to capture value functions and unifies existing models like linear Markov Decision Processes (MDP) and Linear Quadratic Regulators (LQR)
Our work provides a computationally efficient algorithm for the linear Bellman complete setting that works for MDPs with large action spaces, random initial states, and random rewards but relies on the underlying dynamics to be deterministic.
arXiv Detail & Related papers (2024-06-17T17:52:38Z) - The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation [29.69428894587431]
In this paper, we study the offline RL problem with linear function approximation.
Our main structural assumption is that the MDP has low inherent Bellman error.
We show that the scaling of the suboptimality with $sqrtvarepsilon_mathrmBE$ cannot be improved for any algorithm.
arXiv Detail & Related papers (2024-06-17T16:04:06Z) - Regularized Q-Learning with Linear Function Approximation [2.765106384328772]
We consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation.
We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise.
arXiv Detail & Related papers (2024-01-26T20:45:40Z) - Parameterized Projected Bellman Operator [64.129598593852]
Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL)
We propose a novel alternative approach based on learning an approximate version of the Bellman operator.
We formulate an optimization problem to learn PBO for generic sequential decision-making problems.
arXiv Detail & Related papers (2023-12-20T09:33:16Z) - Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning [53.97335841137496]
We propose an oracle-efficient algorithm, dubbed Pessimistic Least-Square Value Iteration (PNLSVI) for offline RL with non-linear function approximation.
Our algorithm enjoys a regret bound that has a tight dependency on the function class complexity and achieves minimax optimal instance-dependent regret when specialized to linear function approximation.
arXiv Detail & Related papers (2023-10-02T17:42:01Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - You Only Evaluate Once: a Simple Baseline Algorithm for Offline RL [29.98260009732724]
We propose a baseline algorithm for offline reinforcement learning that only performs the policy evaluation step once.
We empirically find that the proposed algorithm exhibits competitive and sometimes even state-of-the-art performance in a subset of the D4RL offline RL benchmark.
arXiv Detail & Related papers (2021-10-05T19:05:47Z) - A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning [25.39784277231972]
We introduce a new generalized MSPBE that extends the linear MSPBE to the nonlinear setting.
We derive an easy-to-use, but sound, algorithm to minimize the generalized objective.
arXiv Detail & Related papers (2021-04-28T15:50:34Z) - Recent Theoretical Advances in Non-Convex Optimization [56.88981258425256]
Motivated by recent increased interest in analysis of optimization algorithms for non- optimization in deep networks and other problems in data, we give an overview of recent results of theoretical optimization algorithms for non- optimization.
arXiv Detail & Related papers (2020-12-11T08:28:51Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.