Related papers: Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling

Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling

URL: http://arxiv.org/abs/2203.08248v2
Date: Wed, 7 Aug 2024 20:39:29 GMT
Title: Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling
Authors: Alekh Agarwal, Tong Zhang,
Abstract summary: We present the first result for non-linear function approximation which holds for general action spaces under a linear embeddability condition. We show worst case sample complexity guarantees that scale with a rank parameter of the RL problem.
Score: 28.371541697552928
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Provably sample-efficient Reinforcement Learning (RL) with rich observations and function approximation has witnessed tremendous recent progress, particularly when the underlying function approximators are linear. In this linear regime, computationally and statistically efficient methods exist where the potentially infinite state and action spaces can be captured through a known feature embedding, with the sample complexity scaling with the (intrinsic) dimension of these features. When the action space is finite, significantly more sophisticated results allow non-linear function approximation under appropriate structural constraints on the underlying RL problem, permitting for instance, the learning of good features instead of assuming access to them. In this work, we present the first result for non-linear function approximation which holds for general action spaces under a linear embeddability condition, which generalizes all linear and finite action settings. We design a novel optimistic posterior sampling strategy, TS^3 for such problems, and show worst case sample complexity guarantees that scale with a rank parameter of the RL problem, the linear embedding dimension introduced in this work and standard measures of the function class complexity.

Related papers

Q-function Decomposition with Intervention Semantics with Factored Action Spaces [51.01244229483353]
We consider Q-functions defined over a lower dimensional projected subspace of the original action space, and study the condition for the unbiasedness of decomposed Q-functions. This leads to a general scheme which we call action decomposed reinforcement learning that uses the projected Q-functions to approximate the Q-function in standard model-free reinforcement learning algorithms.
arXiv Detail & Related papers (2025-04-30T05:26:51Z)
Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning [53.97335841137496]
We propose an oracle-efficient algorithm, dubbed Pessimistic Least-Square Value Iteration (PNLSVI) for offline RL with non-linear function approximation. Our algorithm enjoys a regret bound that has a tight dependency on the function class complexity and achieves minimax optimal instance-dependent regret when specialized to linear function approximation.
arXiv Detail & Related papers (2023-10-02T17:42:01Z)
Optimal Nonlinearities Improve Generalization Performance of Random Features [0.9790236766474201]
Random feature model with a nonlinear activation function has been shown to performally equivalent to a Gaussian model in terms of training and generalization errors. We show that acquired parameters from the Gaussian model enable us to define a set of optimal nonlinearities. Our numerical results validate that the optimized nonlinearities achieve better generalization performance than widely-used nonlinear functions such as ReLU.
arXiv Detail & Related papers (2023-09-28T20:55:21Z)
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data. For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z)
Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation [24.577243536475233]
offline reinforcement learning (RL) concerns pursuing an optimal policy for sequential decision-making from a pre-collected dataset. Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators. We revisit the linear-programming framework for offline RL, and advance the existing results in several aspects.
arXiv Detail & Related papers (2022-12-28T15:28:12Z)
Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications. We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA) Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z)
Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces. We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z)
Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning. In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner. We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z)
Bilinear Classes: A Structural Framework for Provable Generalization in RL [119.42509700822484]
Bilinear Classes is a new structural framework which permits generalization in reinforcement learning. The framework incorporates nearly all existing models in which a sample complexity is achievable. Our main result provides an RL algorithm which has sample complexity for Bilinear Classes.
arXiv Detail & Related papers (2021-03-19T16:34:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.