Non-Linear Reinforcement Learning in Large Action Spaces: Structural
Conditions and Sample-efficiency of Posterior Sampling
- URL: http://arxiv.org/abs/2203.08248v1
- Date: Tue, 15 Mar 2022 20:50:26 GMT
- Title: Non-Linear Reinforcement Learning in Large Action Spaces: Structural
Conditions and Sample-efficiency of Posterior Sampling
- Authors: Alekh Agarwal and Tong Zhang
- Abstract summary: We present the first result for non-linear function approximation which holds for general action spaces under a linear embeddability condition.
We show worst case sample complexity guarantees that scale with a rank parameter of the RL problem.
- Score: 38.30154154957721
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Provably sample-efficient Reinforcement Learning (RL) with rich observations
and function approximation has witnessed tremendous recent progress,
particularly when the underlying function approximators are linear. In this
linear regime, computationally and statistically efficient methods exist where
the potentially infinite state and action spaces can be captured through a
known feature embedding, with the sample complexity scaling with the
(intrinsic) dimension of these features. When the action space is finite,
significantly more sophisticated results allow non-linear function
approximation under appropriate structural constraints on the underlying RL
problem, permitting for instance, the learning of good features instead of
assuming access to them. In this work, we present the first result for
non-linear function approximation which holds for general action spaces under a
linear embeddability condition, which generalizes all linear and finite action
settings. We design a novel optimistic posterior sampling strategy, TS^3 for
such problems, and show worst case sample complexity guarantees that scale with
a rank parameter of the RL problem, the linear embedding dimension introduced
in this work and standard measures of the function class complexity.
Related papers
- Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning [53.97335841137496]
We propose an oracle-efficient algorithm, dubbed Pessimistic Least-Square Value Iteration (PNLSVI) for offline RL with non-linear function approximation.
Our algorithm enjoys a regret bound that has a tight dependency on the function class complexity and achieves minimax optimal instance-dependent regret when specialized to linear function approximation.
arXiv Detail & Related papers (2023-10-02T17:42:01Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Revisiting the Linear-Programming Framework for Offline RL with General
Function Approximation [24.577243536475233]
offline reinforcement learning (RL) concerns pursuing an optimal policy for sequential decision-making from a pre-collected dataset.
Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators.
We revisit the linear-programming framework for offline RL, and advance the existing results in several aspects.
arXiv Detail & Related papers (2022-12-28T15:28:12Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces.
We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z) - Sample-Efficient Reinforcement Learning Is Feasible for Linearly
Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning.
In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner.
We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z) - Bilinear Classes: A Structural Framework for Provable Generalization in
RL [119.42509700822484]
Bilinear Classes is a new structural framework which permits generalization in reinforcement learning.
The framework incorporates nearly all existing models in which a sample complexity is achievable.
Our main result provides an RL algorithm which has sample complexity for Bilinear Classes.
arXiv Detail & Related papers (2021-03-19T16:34:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.